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POSITIONAL SEQUENCING BY HYBRIDIZATION 

Reference to Related Anplications 

This is a continuation-in-part of United States patent appUcation serial 

number 07/972,012 ffled November 6, 1992. 

5 BftckprounH of the Invention 

1. Field of the Invention 

This invention relates to methods for sequencing nudeic adds by 
positional hybridization and to procedures combining these methods with more 
conventional sequendng tedmiques and with other molecular biology tedmiques 

10 including tedmiques utilized in PGR (polymerase diain reaction) tedmology. Useful 

^Ucations indude the creation of probes and arrays of probes for detecting, 
identifying, purifying and sequencing target nudeic adds m biological samples. The 
invention is also directed to novel methods for the repUcation of probe arr^ to the 
repUcated arrays, to diagnostic aids comprising nudeic add probes and amys useful for 

15 screening biological samples for target nudeic adds and nudeic add variations. 

2. Description of the Background 

Since the recognition of nudeic add as the carrier of the genetic code, a 
great deal of mterest has centered around determining the sequence of that code in the 
many forms whidi it is found. Two landmark studies made the process of nudeic add 

20: sequendng, at least with DNA, a common and relativety rapid procedure practiced in 

most laboratories. The first describes a process whereby terminaUy tabeled DNA 
molecules are chemicaify dcaved at smgle base repetitions (A^ Maxam and W. 
Gilbert, Proc NatL Acad. So. USA 74-.560-564, 1977). Eadi base position in the nudeic 
add sequence is then determined ftom the molecular weights of fragments produced by 

25 partial deavages. Individual reactions were devised to deave preferentiaUy at guanine, 

at adenine, at cytosine and thymine, and at cytosine alone. When the products of these 
four reactions are resolved by molecular weight, using, for example, polyaoylamide gel 
electrophoresis, DNA sequences can be read from the pattern of fragments on the 
resolved gel. 
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Thc second study describes aprocedure whereby DNA is sequenced using 
a variation of the plus-minus method (F. Sanger et al., Proc. NaU. Acad. Sd. USA 
74-5463-67 1977). THis procedure takes advantage of the chain temrinating abdity of 
dideoxynudeoside triphosphates (ddNTPs) and the ability of DNA Polym^"- J° 
5 incorporate ddNTP with nearly equal fideUty as the natural substrate of DNA 

polymerase, deoxynudeosides triphosphates (dNTPs). A primer. usuaUy an 
oligonudeotide. and a template DNA are incubated together in the presence of a usefiil 
concentration of all four dNTPs plus a limited amount of a single ddNTP. THe DNA 
polymerase occasionally incorporates a dideoxynudeotide whidx terminates d«m 
10 extension. Because the dideoxynudeotide has no 3'-hydroxyl, the initiation point for the 

polymerase en^e is lost Polymerization produces a mixture of fragments of vaned 
sizes, all having identical 3' termim. Fractionation of the mixture by. for example, 
polyacrylamide gel electrophoresis, produces apattem whidi indicates the presence and 
position of eadi base in the mideic add. Reactions with eadi of the four ddNIPs 
15 allows one of ordinary skill to read an entire mideic add sequence from a resolved gel. 

Despite their advantages, these procedures are cumbersome and 
impractical when one wishes to obtain megabases of sequence information. Further, 
these procedures are. for all practical purposes, limited to sequendng DNA. Although 
variations have developed, it is still not possible using either process to obtain sequence 
20 information directly from any other form of nudeic add. 

A new method of sequendng has been developed whidi overcomes some 
of the problems assodated with currem methodologies wherein sequence information 
is obtained in multiple discrete padages. Instead of having a particular nudeic add 
sequenced one base at a time, groups of contiguous bases are determined simultaneously 
25 by hybridization. There are many advantages induding increased speed, reduced 

expense and greater accuraqr. 

Two general approadies of sequendng by hybridization have been 
suggested. THeir practicaUty has been' demonstrated in pUot stodies. In one format, a 
complete set of 4- nucleotides of length n is immobilized as an ordered array on a soUd 
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support and an unknown DNA sequence is hybridized to this array (KJL Khrapko et 
al, J. DNA Sequencing and Mapping 1:375-88, 1991). The resulting hybridization 
pattern provides aU n-tuple words in the sequence. This is sufficient to determine shon 
sequences except for simple tandem repeats. 

In the second format, an array of immobilized samples is hybridized with 
one short oUgonucleotide at a time (Z. Strezoska et aL, Proa NatL Acad. Sd. USA 
88:10,089-93, 1991). When repeated 4» times for each oligonucleotide of length n, much 
of the sequence of all the immobilized saxnples would be determined. In both 
approaches, the intrinsic power of the method is thai many sequenced regions are 
determined in parallel. In actual practice the array size is about 10* to 10*. 

Another powerful aspect of the method is that information obtained is 
quite redundant, especially as the size of the nudeic add probe grows. Mathematical 
simulations have shown that the method is quite resistant to e3q)erimental errors and 
that far fewer than all probes are necessary to determine reliable sequence data (PA. 
Pcvzner et aL, J, Biomol. Struc & Dyn. 9:399-410, 1991; W. Bains, Genomics 11:295- 
301, 1991). 

In spite of an overall optimistic outlook, there are still a number of 
potentially severe drawbacks to actual implementation of sequendng by Iqrbridization. 
First and foremost among these is that 4' rapidly becomes quite a large number if 
chemical synthesis of all of the oligonudeotide probes is actually contemplated. Various 
schemes of automating this synthesis and conqiressing the products into a small scale 
array, a sequendng chip, have been proposed. 

A second drawback is the poor level of discrimination between a correctly 
hybridized, perf ectty matdied duplexes, and an end mismatch. In part, these drawbacks 
have been addressed at least to a small degree by the method of continuous stacking 
hybridization as reported by a Khrapko et aL (FEES Lett 256:118-22, 1989). 
Continuous staddng hybridization is based upon the observation that when a single- 
stranded oligonudeotide is hybridized adjacent to a double-stranded oligonudeotide, the 
two duplexes are mutually stabilized as if they are positioned side-to-side due to a 
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stacking contact between them, m stabiUty of the interaction decreases sxgmficantly 
as stackingisdisnzptedl^nudeotide displacement, gap. crternunal^^^^ Interna^ 
etches arepresumablyignorablebecause their thermodynamic stabiH^^^ 
less than perfect matches. Although promising, a related problem anses which ^ the 
i^mty to distinguish between weak, but correct duplex formation, and smiple 
background such as non^edfic adsorption of probes to the underlying support matmo 
Athirddrawbackisthatdetectionismonochromatic. Separate sequennal 
positive and negative controls must be run to discriminate between a correct 

hybridization match, a mis-match. and background. 

Afourthdiawbackis that ambiguities develop in reading sequences longer 

than a few hundred base pairs on account of sequence recurrences. For example, if a 
sequence the same length of the probe recurs three times in the target, the sequence 
position cam«,t be uniquely determined. THe locations of these sequence ambiguities 

are called branch points. 

A fifth drawback is the effect of secondary structures in the target nucleic 
add. nus could lead to blodcs of sequences that are unreadable if the secondary 
structure is more stable than occurs on the complimentary strand. 

Afinal drawbadc is the possibility that certainprobes will have anomalous 
behavior and for one reason or another, be recaldtrant to hybridization under whatever 
standard sets of conditions ultimately used. A simple example of this is the difficulty 
in finding matdung conditions for probes lidi in G/C content A more complex 
examplecouldbesequenceswithahighpropensitytoformtripleheUces. THeonlyway 
to rigorously explore these possftffities is to carry out extensive hybridization smdies 
vnih aU possible oligomideotides of lengUi n. under the particular format and conditions 
chosen. TTiis is dearly impractical if many sets of conditions are involved. 

Among die early pubUcation whidi appeared discussing sequendng by 
hybridization. EJ^ Southern (PCT appUcation no. WO 89/10977. published November 
16 1989- whidi is hereby spedfically incorporated by reference), described metiiods 
v*ereby'unknown. or target, nucleic adds are labeled, hybridized to aset of nucleotides 
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of chosen length on a solid support, and the nucleotide sequence of the target 
determined, at least partially, from knowledge of the sequence of the bound fragments 
and the pattern of hybridization observed. Although promising, as a practical matter, 
this method has numerous drawbacks. Probes are entirely single-stranded and binding 

5 stability is dependant upon the size of the duplex. However, every additional nucleotide 

of the probe necessarily increases the size of the array by four fold creating a dichoton^r 
which scverly restricts its plausible use. Further, there is an inabiUty to deal with branch 
point ambiguities or secondaiy structure of the target, and hybridization conditions will 
have to be taylored or in some way accounted for for each binding event 

10 R. Dnnanac et al. (U.S. Patent No. 5,202,231; which is specifically 

incorporated by reference) is directed to methods for sequendng by hybridization using 
sets of oligonucleotide probes witii randon sequences. These probes, altiiough useful, 
suffer from some of the same drawbacks as die metiiodology of Soutiiem (1989), and 
like Southern, fail to recognize the advantages of stacking interactions. 

15 KH. Khrapko et aL (FEES Lett 256:118-22, 1989; and J. DNA 

Sequencing and M^jping 1:357-88, 1991) attempt to address some of tiiese problems 
using a technique referred to as continuous staddng hybridization. Witii continuous 
staddng, concq)tually, the entire sequence of a target nucleic add can be determined. 
Basically, the target is hybridized to an array of probes, again single-stranded, denatured 

20 from the array, and the dissociation kinetics of denaturation anafyzed to determine the 

target sequence. Although also promising, discrimination between matches and mis- 
matches (and ample badcgtound) is low. and fortiier, as hybridization conditions are 
inconstant for each duplex, discrimination becomes increasingly reduced with increasing 
target complexity. 



25 



SiimmaTv of the Invention 

The present iirvention overcomes the problems and disadvantages 
associated witii current strategies and designs and provides new mctiiods for r^idly and 
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accumelydctenniBingthcBudeotide sequence ofarnxdeicad^ 

methods of positional sequendng by hybridization. 

One embodiment of the invention is directed to arrays of R drfferem 
nudeic add probes wherein ead. probe con^rises a double-stranded portion of length 
5 D a terminal single-stranded portion of length S. and a random mideotide sequence 

v^thin the single-stranded portion of length R. -Riese arrays may be bound to sohd 
supports and are useM for determining the mideotide sequence of unknown mideic 
adds and for the detection, identification and purification of target nudeic adds m 
biological samples. 

JO Another embodiment of the invention is directed to methods for creatmg 

arrays of probes conq,rising the steps of synthesizing a first set of nudeic adds eadi 
comprising a constant sequence of length C at the y-terminus. and a random sequence 
of length R at the S'-temiinus. synthesizing a second set of nudeic adds eadi oon^nsmg 
a sequence compKmentaiy to the constant sequence of the first mideic add. and 

15 hybridizing the first set with the second set to form the array. 

Another embodiment of the invention is directed to methods for creating 
anays of probes comprising the steps of synthesizing a set of mideic adds eadi 
containing a random internal sequence of length R flanked by the deavage sites of a 
restridion enzyme, synthesizing a set of primers eadi compUementaiy to a non-random 

20 sequence of the mideic add, hybridizing the two sets together to form hybrids, extending 

the sequence of the primer by polymerization usmg the mideic add as a temptate, and 
deaving the hybrids with the restriction enzyme to form an atr^ of probes with a 
double-stranded portion and a single-stranded portion and with the random sequence 

within the single stranded portion. 
25 Another embodiment of the invention is directed to repUcated arrays and 

methods for repUcating arrays of probes, preferably on a soUd support, comprising the 
steps of synthesizing an arr^ of mideic adds eadi comprising a constant sequence of 
length C at a 3 '-terminus and a random sequeno* of length R at a 5'-terminus, fixing the 
array to a first soUd support, synthesizing a set of nudeic adds eadi comprising a 
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sequence complimentaiy to the constant region of the array, hybridizmg the nucleic 
adds of the set with the array, enzymatic^y extending the nucleic adds of the set using 
the random sequences of the array as templates, denaturing the set of extended nudeic 
adds, and fixing the denatured nucleic adds of the set to a second solid support to 
create the replicated array of probes* The replicated array may be single-stranded or 
double*stranded, it may be fixed to a solid support or free in solution, and it is useful 
for sequencing, detecting or sinqily identifying target nudeic adds. 

The array is also usefiil for the purification of nudeic add from a complex 
mixture for later identification and/or sequencing. A purification array comprises 
suffident numbers of probes to hybridize and thereby effectively capture the target 
sequences from a coniplex sample. The l^ridized array is washed to remove non-target 
nudeic adds and any other materials wbicix may be present and the target sequences 
eluted by denaturing. From the elution, purified or semi-purified target sequences are 
obtained and collected. This collection of target sequences can then be subjected to 
normal sequendng methods or sequenced by the methods described herein. 

Another embodiment of the invention is directed to nudeic add probes 
and methods for creating nudeic add probes comprising the steps of synthesizing a 
plurality of single-stranded first nudeic adds and a plurality of longer single-stranded 
second nudeic adds wherein each each second nudeic add comprises a random 
terminal sequence and a sequence conqslinientaiy to a sequence of the first nudeic 
adds, hybridizing the fiirst nudeic adds to the second to form partial duplexes having 
a double-stranded portion and a single-stranded portion with the random sequence 
within the singje-stranded portion, hybridizing a target nudeic add to the partial 
duplexes, optionally ligating the hybridized target to the first nudeic add of the partial 
duplexes, isolating the second nudeic add from the ligated duplexes, synthesizing a 
pluraliQr of third nudeic adds each complimentary to the constant sequence of the 
second nudeic add, and hybridizing the third nucleic adds with the isolated second 
nudeic adds to create the nucleic adcl probe. Alternatively, after formation of the 
partial duplexes, the target is ligated as before and hybridized with a set of 
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oligonucleotides comprising random sequences. These oUgonucleotides are Ugated to 
the second nucleic add, the second nucleic add is isolated, another pluraUty of first 
nudeic adds are synthesized, and the first nudeic adds are hybridized to the 
oUgonucleotide ligated second nudeic adds to form the probe. ligation aUows for 
5 hybridization to be performed under a single set of hybridization conditions. Probes 

may be fixed to a soUd support and may also contain en^e recognition sites within 
their sequences. 

Another embodiment of the invention is directed to diagnostic aids and 
methods utilizing probe arrays for the detection and identification of target nudeic adds 

10 in biological samples and to methods for using the diagnostic aids to screen biological 

samples. Diagnostic aids as described are also useful for the purification of identified 
targets and, if desired, for their sequencing. Tliese aids conqirise probes, solid supports, 
labels, necessary reagents and the biological samples. 

Other advantages of the invention are set forth in part in the description 

15 which follows, and in part, will be obvious firom this descr^tion, or may be learned from 

the practice of this invention. The accompaiQdng drawings which are incorporated in 
and constitute a part of this specification, illustrate and, together with this description, 
serve to e3q>lain the prindple of the invention. 



Brief Description of the Drawings 

20 Figure 1 Energetics of staddng l]^ridization. Structures consist of a long target 

and a probe of length n. The top three sanq>le are ordinary hybridization 
and the bottom three are staddng hybridization. 
Figure 2 (A) The first step of the basic scheme for positional sequencing by 
hybridization depicting the hybridization of target nudeic add with probe 

25 forming a 5* overhang of the target 

(B) The first step of the alternate scheme for positional sequencing by 
hybridization depicting the hybridization of target nudeic add with probe 
forming a 3* overhang of the probe. 
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Figure 3 Gr^hic representation of the ligation step of positional sequencing by 
hybridization wherein hybridization of the target nucleic add produces 

(A) a 5* overhang or (B) a 3* overhang. 
Figure 4 Preparation of a random probe array. 

5 Figure 5 Single nucleotide extension of a probe hybridized with a target nucleic 

add using DNA polymerase and a single dideoo^mideotide. 
Figure 6 Preparation of a nested set of targets using labeled target nudeic adds 

partially digested with exonudease m. 
Figure 7 Determination of positional information using the ratio of internal label 
10 to terminal label. 

Figure 8 (A) Extension of one strand of the probe using a hybridized target as 
template with a single deoxynudeotide. 

(B) Hybridization of target with a fixed probe followed by ligation of 
probe to target 

15 Figure 9 Four color analysis of sequence extensions of the 3* end of a probe using 

three labeled nucleoside triphosphates and one unlabeled chain 
terminator. 

Figure 10 Extension of a nudeic add probe by ligation of a pentanudeotide 3' 

blodced to prevent polymerization. 
20 Figure 11 Preparation of a customized probe containing a 10 base pair sequence 

that was present in the original target nudeic add. 
Figure 12 Graphic representation of the general procedure of positional sequencing 

by hybridization. 

Figure 13 Graphical representation of the ligation effidency of positional 
25 sequencing. Depicted is the relationship between the amount of label 

remaining over the total amounts of label in the reaction, verses NaQ 
concentration. 

Figure 14 A diagrammatic representation of the construction of a complimentary 
array of master beads. 



PCr/US93/10616 

WO 94/11530 



.10- 



Description of the Invention 

The present invention overcomes the problems and disadvantages 
associated with current strategies and designs and provides new methods and probes, 
new diagnostic aids and methods for using the diagnostic aids, and new arrays and 
5 methods for creating arrays of probes to detect, identify, purify and sequence target 

nucleic acids. Nucleic adds of the invention include sequences of deoaqrribonudeic add 
(DNA) or ribonudeic add (RNA) which may be isolated from natural sources, 
recombinantly produced, or artificially synthesized Preferred embodiments of the 
present invention is probe synthesized using traditional diemical synthesis, using the 

10 more rapid polymerase diain reaction (PGR) technology, or using a combination of 

these two methods. 

Nudeic adds of the invention further encon^ass polyamide nudeic add 
(PNA) or wy sequence of what are commonly referred to as bases joined by a chemical 
backbone that have the ability to base pair or hybridize with a conq>Iimentaxy chemical 

15 structure. The bases of DNA, RNA, and PNA arc purines and pyrimidines linearly 

linked to a chemical baddx>ne. Common chemical backbone structures are deoxyribose 
phosphate and ribose phosphate. Recent studies demonstrated that a number of 
additional structures may also be efEective, such as the polyanude baddione of PNA 
(P.E. Nielsen et al., Sd. 254:1497-1500, 1991). 

20 The purines foimd in both DNA and RNA are adenine and guanine, but 

others known to exist are xanthine, iQ^oxanthiiiLe, 2- and 1-diaminopiuine, and other 
more modified bases. The pyrimidines are cytosine, which is conunon to both DNA and 
RNA, uracil found predominantly in RNA, and thymidine which occurs exdusively in 
DNA. Some of the more apical pyrimidines indude methyicytosine, hydrojqraethyl- 

25 cytosine, metltyluradl, hydroxymethyluradl, dihydro^ypentjiuradl, and other base 

modifications. These bases interact in a complimentary fashion to form base-pairs, such 
as, for exanq>le, guanine with cytosine and adenine with thymidine. However, this 
invention also encompasses situations in which there is nontraditional base pairing such 
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as Hoogsteen base pairing which has been identified in certain tRNA molecules and 
postulated to exist in a triple helix. 

One embodiment of the invention is directed to a method for determining 
a nucleotide sequence by positional Iqrbridization con^irising the steps of (a) creating 
a set of nucleic add probes wherein each probe has a double-stranded portion, a single- 
stranded portion, and a random sequence within the single-stranded portion which is 
determinable, (b) hybridizing a nudeic add target whidi is at least partfy single-stranded 
to the set of nudeic add probes, and (c) determining the nudeotide sequence of the 
target vMch hybridized to the single-stranded portion of any probe. The set of nudeic 
add probes and the target nudeic add may comprise DNA, RNA, PNA, or any 
combination thereof, and may be derived from natural sources, reco mbin ant sources, or 
be synthetically produced. Eadi probe of the set of nudeic add probes has a double- 
stranded portion which is preferably about 10 to 30 nudeotides in length, a single- 
stranded portion which is preferably about 4 to 20 nudeotides in length, and a random 
sequence within the single-stranded portion vdndi is preferabty about 4 to 20 nudeotides 
in length and more preferably about S nudeotides in length. A princq)le advantage of 
this probe is in its structure. Hybridization of the target nudeic add is encouraged due 
to the favorable thermodynamic conditions established by the presence of the adjacent 
double-strandedness ot the probe. An entire set of probes contains at least one example 
of every possible random nudeotide sequence. 

By way of example only, if the random portion consisted of a four 
nudeotide sequence (R«4) of adenine, guanine, thymine, and cystosine, the total 
number of possible combinations (4^) would be 4* or 256 different nudeic add probes. 
If the number of nudeotides in the random sequence was five, the number of different 
probes within the set would be 4* or 1,024. This becomes a very large number indeed 
when considering sequences of 20 nudeotides or more. 

However, to determine the conq>lete sequence of a nudeic add target, the 
set of probes need not contain every possible combination of nudeotides of the random 
sequence to be encompassed by the method of this invention. This variation of the 



PCr/US93/l0616 

WO 94/11530 

-12- 



invention is based on the theory of degenerated probes proposed by S.C Macevicz 
(International Patent Application, US89-04741, published 1989, and herein specifically 
incorporated by reference). The probes are divided into four subsets. In each, one of 
the four bases is used at a defined number of positions and aU other bases except that 
5 one on the remaining positions. Probes from the first subset contain two elements, A 

and non-A (A = adenosine). For a nudeic add sequence of length k, there are 4(2^ - 
1), instead of 4* probes. Where k = 8, a set of probes would consist of onty 1020 
different members instead of the entire set of 65,536. The savings in time and expense 
would be considerable. In addition, it is also a method of the present invention to 

10 utilize probes wherein the random nudeotide sequence contains gapped segments, or 

positions along the random sequence which will base pair with any nudeotide or at least 
not interfere with adjacent base pairing. 

Hybridization between con^jlimentaiy bases of DNA, RNA, PNA, or 
combinations of DNA, RNA and PNA, occurs under a wide variety of conditions such 

15 as variations in tenq>erature, salt concentration, electrostatic strength, and buffer 

composition. Examples of these conditions and methods for applying them are 
described in Nucleic Add Hybridization: A Practical Approach (B.D. Hames and S J. 
Higgins, editors, IRL Press, 1985), which is herein specifically incorporated by reference. 
It is preferred that hybridization takes place between about 0*C and about 70'C, for 

20 periods of from about 5 minutes to hours, depending on the nature of the sequence to 

be hybridized and its length. For example, ^ical l^ridization conditions for a mixture 
of two 20-mers is to bring the mixture to 68'C and let cool to room temperature (22*C) 
for five minutes or at very low temperatures such as 2"C in 2 microliters. It is also 
preferred that hybridization between nudeic adds be facilitated using buffers such as 

25 saline, Tris-EDTA (TE), Tris-HQ and other aqueous solutions, certain reagents and 

chemicals. Preferred examples of these reagents indude single-stranded binding 
proteins such as Rec A protein, T4 gene 32 protein, K coli single-stranded binding 
protein, and major or minor nudeic arid groove binding proteins. Preferred examples 
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of other reagents and chemicals include divalent ions, polyvalent ions, and intercalating 
substances such as ethidium bromide, actinomydn D, psoralen, and angelidn. 

The nucleotide sequence of the random portion of each probe is 
determinable by methods which are well-known in the art Two methods for 
determining the sequence of the nucleic acid probe are by chemical deavage, as 
disdosed by Maxam and Gilbert (1977), and by chain extension using ddNTPs, as 
disdosed by Sanger et aL (1977), both of which are herein spedficalty incorporated by 
reference. Alternative^, another method for determining the nudeotide sequence of 
a probe is to individually synthesize each member of a probe set. The entire set would 
comprise every possible sequence withiathe random portion or some smaller portion 
of the set Tlie method of the present invention could then be conducted with each 
member of the set Another procedure would be to synthesize one or more sets of 
nudeic add probes simultaneous^ on a solid sigjport Preferred exanq>les of a solid 
si^yport include a plastic, a ceramic, a metal, a resin, a gel, and a membrane. A more 
preferred embodiment comprises a two-dimensional or three-dimensional matrix, such 
as a gel, with multiple probe binding sites, such as a hybridization diip as described by 
Pevzner et aL (J. Biomol. Stnic. & Dyn. 9:399-410, 1991), and by Maskos and Southern 
(Nuc Adds Res, 20:1679-84, 1992), both of ^^cfa are herein specifically incorporated 
by reference. Nudeic adds are bound to the solid support by covalent binding such as 
by coiqugation with a coiq>ling agent, or by non-covalent binding sudi as an electrostatic 
interaction or andbod^-antigen coiq>ling. Typical ooiq>ling agents indude biotin/ 
streptavidin, Staphylocxxxsis aureus protein A/IgG antibody F« fragment, and 
streptavidin/protein A chimeras (T.Sano and CR. Cantor, Bio/Technology 9:1378-81, 
1991), 

Hybridization chq>s can be used to construct veiy large probe arrays which 
are subsequently hybridized with a target nucleic add. Analysis of the hybridization 
pattern of the chip provides an immediate fingerprint identification of the target 
nudeotide sequence. Patterns can be manually or computer analyzed, but it is dear that 
positional sequencing by hybridization lends itself to computer analysis and automation. 
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Algorithms and software have been developed for sequence reconstruction which are 
appUcable to the methods described herein (R. Dnnanac et al., J. Biomol. Struc & Dyn. 
5:1085.1102. 1991; P. A. Pevmer, J. Biomol. Struc. & Dyn. 7:63-73, 1989, both of which 
are herein specifically incorporated by reference). 
5 Preferably, target mideic adds are labeled with a detectable label. Label 

may be incorporated at a 5* terminal site, a 3' terminal site, or al an internal site within 
the length of the nudeic add. Preferred detectable labels indude a radioisotope, a 
stable isotope, an enzyme, a fluorescent chemical, a luminescent chemical, a chromatic 
chemical, a metal, an electric charge, or a spatial structure. There are maxsy procedures 

10 whereby one of ordinary skill can incorporate detectable label into a nudeic add. For 

example, enzymes used in molecular biology will incorporate radioisotope labeled 
substrate into nudeic add. These indude polymerases, kinases, and transferases. Hie 
labeling isotope is preferabty, "P, "^S, or 

Label may be directly or indirectly detected using sdntillation fluid or a 

15 Phosphorlmager, chromatic or fluorescent labeling, or mass spectrometry. Other, more 

advanced methods of detection indude evanescent wave detection of surface plasmon 
resonance of thin metal film labels such as gold, by, for exanqile, the BL^core sensor 
sold by Pharmada, or other suitable biosensors. Alternatively, the probe may be labeled 
and the target nudeic add detected, identified and possibly sequenced from interaction 

20 with the labeled probe. For example, a labeled probe or array of probes may be fixed 

to a solid support From an anatysis of the binding observed after Iq^bridization with a 
biological sample containing nudeic add, the target nudeic add is identified. 

Another embodiment of the invention is directed to methods for 
determining a sequence of a nudeic add conq>rising the steps of labeling the nudeic 

25 add with a first detectable label at a terminal site, labeling the nudeic add with a 

second detectable label at an internal site, identifying the nudeotide sequences of 
portions of the nudeic add, determining the relationship of the nudeotide sequence 
portions to the nudeic add by comparing the first detectable label and the second 
detectable label, and determining the nucleotide sequence of the nudeic add. 
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Fragments of target nudeic acids labeled both terminally and intemaUy can be 
distinguished based on the relative amounts of each labelwithin respective fragments. 
Fragments of a target nucleic add terminaUy labeled with a first detectable label «all 
have the same amount of label as fragments wMdi indude the labeled termmus. 
However, theses fragments wiU have variable amounts of the internal label directly 
proportional to their size and distance for the termim«i. By comparing the relauve 
aixiount of the first label to the relative amount of the second label in eadi fragment, 
one of ordinary skill is able to determine the position of the fragment or the position 
of the nudeotide sequence of that fragment within the whole nudeic add. 

Another embodiment of the invention is directed to methods for 
determining a nudeotide secpience by hybridization conyrising the steps of creating a 
set of mideic add probes wherein eadi probe has a double-stranded portion, a single- 
stranded portion, and a random sequence within the single-stranded portion whidi is 
determinable, hybridizing a mxdeic add target whidi is at least party single-stranded to 
the set. Ugating the hybridized target to the probe, and determining the nudeic sequence 

of the target whidi is hybridized to the sin^e-stranded portion of any probe. This 
embodimentaddsastepwherein the hybridized targetisligated to theprobe. ligation 
of the target nudeic add to the feomplimentaiy probe increases fideUty of hybridization 
and allows for incorrectly hybridized target to be easily washed from correctly hybridized 
target (Figure 11). More importantly, the addition of a ligation step allows for 
bybridizdons to be performed under a single set of hybridization conditions. For 
example, hybridization temperature is preferably between about 22-37'X)C the salt 
concentration usefiil is preferably between about 0.05.0JM, and the period of 
hybridization is between about 1-14 hours. Hiis is not possible using the methodoligies 
of the current procedures whidi do not employ a ligation step and represents a very 
substantial improvement Ligation can be accomplished using a eukaiyotic derived or 
a prokaryotic derived ligase. Preferred is T4 DNA or RNA ligase. Methods for use of 
these and other nudeic add modifying enzymes are described in Oorent Protocols in 
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Molecutar Biology {FM. Ausubel et al.. editors, John WQey & Sons. 1989), which is 
herein specifically incorporated by reference. 

There are a number of distinct advantages to the incorporation of a 
Kgation step. First and foremost is that one can use identical hybridization conditions 

5 for hybridization. Variation of hybridization conditions due to base composition are no 

longer relevant as nucleic adds with high A^" or G/C content ligate with equal 
efficiency. Consequently, discrimination is very high between matches and mis-matches, 
much higher than has been achieved using other methodologies sudi as Southern (1989) 
wherein the effects of G/C content were only somewhat neutralized in high 

10 concentrations of quartemaiy or tertiary amines (e.g. 3M tetramethyl ammonium 

chloride in Drmanac et aL. 1993). 

Another embodiment of the invention is directed to methods for 
determining a nucleotide sequence by hybridization which comprises the steps of 
creating a set of Hudeic add probes wherein eadi probe has a double-stranded portion. 

15 a single-stranded portion, and a random sequence withm the single-stranded portion 

whidi is determinable, hybridizing a target nudeic add which is at least partty single- 
stranded to the set of nudeic add probes, cnzymaticaUy extending a strand of the probe 
using the hybridized target as a template, and determining the nudeotide sequence of 
the single-stranded portion of the target nudeic add. This embodiment of the invention 

20 is similar to the previous embodiment, as broadly described herein, and mdudesaU of 

the aspects and advantages described therein. An alternative embodiment also indudes 
a step wherein hybridized target is ligated to the probe. Ligation increases the fideUty 
of the Iqrbridization and allows for a more stringent vrash step wherein incorrectfy 
hybridized, unligated target can be removed and further, allows for a single set of 

25 hybridization conditions to be employed. Most nonhgation techniques induding 

Southern (1989). Drmanac et al. (1993). and Khrapko et al. (1989 and 1991), are only 
accurate, and only margmaUy so, when hybriizations are performed under optimal 
conditions which vary with the G/C content of each interaction. Preferable condiions 
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comprise a hybridization temperature of between about 22.37''OC, a salt concentration 
of betwen about 0.05-Oii M, and a hybridization period of between about 1-14 hours. 

Hybridization produces either a 5' overhang or a 3* overhang of target 
nucleic add. Where there is a 5* overhang, a 3- hydro^grl is avaUable on one strand of 
5 the probe from which nucleotide addition can be initiated. Preferred en^mes for this 

process include eukaryotic or prokaryotic polymerases such as T3 or T7 polymerase, 
Klenow fragment, or Taq potymerase. Each of these enzymes are readify available to 
those of ordinaiy skill in the art as are procedures for their use {Currmt Protocols in 
Molecular Biology). 

10 Hybridized probes may also be eni^anatically extended a predetermined 

length. For example, reaction condition can be established wherein a single dNTP or 
ddNTP is utilized as substrate. Only hybridized probes wherein the first nucleotide to 
be incorporated is complimentary to the target sequence will be extended, thus, 
providing additional hybridization fidelity and additional information regarding the 

15 nucleotide sequence of the target Sanger (1977) or Maxam and Gilbert (1977) 

sequencing can be performed which would provide further target sequence data. 
Alternative^, hybridization of target to probe can produces 3* extensions of target 
nucleic adds. Hybridized probes can be extended using nucleoside b^hosphate 
substrates or short sequences which are ligated to the S' terminus. 

20- Another embodiment of the invention is directed to a method for 

determining a nudeotide sequence of a target by l^bridization conqprising the steps of 
creating a set of nudeic add probes wherein each probe has a double-stranded portion, 
a single-stranded portion, and a random nudeotide sequence within the single-stranded 
portion which is determinable, deaving a plurality of nudeic add targets to form 

25 fragments of various lengths which are at least partly single-stranded, iQrbridizing the 

single-stranded region of the fragments with the single-stranded region of the probes, 
identifying the nudeotide sequences of the hybridized portions of the fragments, and 
comparing the identified nudeotide sequences to determine the nudeotide sequence of 
the target. An alternative embodiment includes a further step vdierein the hybridized 
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fragments are Ugated to the probes prior to identifying the nucleotide sequences of the 
hybridized portions of the fragments. As described heerin. the addition of a Ugation 
step allows for hybridizations to be performed under a single set of hybridization 
conditions. 

5 In these embodiments, target nucleic add is partially cleaved forming a 

plurality of nucleic add fragments of various lengths, a nested set. vMch is then 
hybridized to the probe. It is preferred that deavage occurs by enzymatic diemical or 
physical means. Preferred enzymes for partial deavage are cxonudease HI. SI nudease, 
DNase I, Bal 31, mnng bean nudease, PI nudease, lambda cxonudease, restriction 

10 endonudease, and RNase I. Preferred means for chemical deavage are ultraviolet light 

induced deavage, ethidhan bromide induced deavage, and deavage induced with add 
or base. Preferred means for mechanical deavage are shearing through direct agitation 
sudi as vortejdng or multiple cydcs of freeze-thaiwing. Procedures for enzymatic 
chemical or physical deavage are disdosed in, for example. Molecular Clomng: A 

15 LaboratoFy Manual (T. Maniatis et aL. editors. Cold Spring Harbor 1989). which is 

herein spedficaUy incorporated by reference. 

Fragmented target nudeic adds will have a distribution of terminal 
sequences which is suffidently broad so that the nudeotide sequence of the hybridized 
fragments will indude the entire sequence of the target nudeic add. A preferred 

20 method is wdierein the set of nudeic add probes is fi»d to a solid support A preferred 

soUd support is a plastic a ceramic a metal or magnetic substance, a resin, a film or 
other polymer, a gel or a membrane, and it is more preferred that the soUd siqiport be 
a two-dimensional or three-dimensional matrix with multqile probe binding sites such 
as a hybridization chip as described by ILR. Khrapko et al. (J. DNA Sequencing and 

25 Mapping l:357-«8, 1991). It is also preferred wherein the target nudeic add has a 

detectable label such as a radioisotope, a stable isotope, an enzyme, a fluorescent 
chemical, a luminescent chemical, a chromatic chemical, a metal, an electric charge, or 
a spatial structure. 
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As an extension of this procedure, it is also possible to use the methods 
herein described to determine the nucleotide sequence of one or more probes which 
hybridize with an unknown target sequence. For example, fragmented targets could be 
terminaUy or internally labeled, hybridized with a set of nucleic add probes, and the 

5 hybridized sequences of the probes determined. This aspect may be useful when it is 

cumbersome to determine the sequence of the entire target and only a smaller region 
of that sequence is of interest 

Another embodiment of the invention is directed a method wherein the 
target nucleic add has a first detectable label at a terminal site and a second detectable 

10 label at an internal site. Thelabelsmaybethesame type of label or of different types 

as long as each can be discriminated, preferably by the same detection method. It is 
preferred that the first and second detectable labels are chromatic or fluorescent 
chemicals or moleailes v^di are detectable Iqr mass spectronietry. Using a double- 
labeling method coupled with anatysis by mass spectrometry provides a very rapid and 

15 accurate sequencing methodology that can be incorporated in sequencing by 

hybridization and lends itself very well to automation and con^mter control. 

Another embodimem of the invention is directed to methods for creating 
a nudeic add probe con^rising the steps of synthesizing a plurality of single-stranded 
first nudeic adds and an array of longer single-stranded second mideic adds 

20 complimentary to the first nudeic add with a random terminal nudeotide sequence, 

hybridizing the first nudeic adds to the second nudeic adds to form hybrids having a 
double-stranded portion and a single-stranded portion with the random nudeotide 
sequence withm the smgle-stranded portion, Iqrbridizmg a single-stranded nudeic add 
target to the hybrids, ligating the hybridized target to the first nudeic add of the hybrid, 

25 isolating the second nudeic add, and hybridizmg the first nudeic add of step with the 

isolated second nudeic add to form a nudeic add probe. Probes created in this manner 
are referred to herein as customized probes. 

Preferred customized probe comprises a first nudeic add which is about 
15-25 nudeotides in length and the second nudeic add is about 20-30 nucleotides in 
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length It is also preferred that the double-stranded portion contain an enzyme 
recognition site which allows for increased flexibiUty of use and faciUtates cloning, 
should it at some point become desirable to done one or more of the probes. It is also 
preferred if the customized probe is fixed to a soUd support, such as, a plasttc a 
ceramic a metal, a resin, a fflm or other polymer, a gel, or a membrane, or possibly a 
two- or three-dimensional array such as a chip or microchip. 

Customized probes, created by the method of this invention, have a wide 
rangeofuses. Tliese probes are, first of all. structurally usefal for identifying and 
binding to only those sequences which are homologous to the ove Aangs. Secondly, the 
overhangs of these probes possess the mideotide sequence of interest. No fiirther 
manipulation is required to carry the sequence of interest to another structure, 
•merefore. the customized probes greatly lend themselves to use in, for example, 
diagnostic aids for the genetic screening of a biological sample. 

Another cnibodiment of the invention is directed to arrays of nudeic add 
probes wherein eadi probe comprises a double-stranded portion of length D, a terminal 
single-stranded portion of length S.andarandomnudeotide sequence within the single- 
stranded portion of length R. Prefetabty. D is between about 3-20 mideotides and S 
is between about 3-20 mideotides and the entire array is fixed to a solid support whidi 
may be composed of plastics, ceramics, metals, resins, polymers and other fihns, gels, 
membranes and two-dimensional and three-dimensional matrices sudi as hybridization 
ch^ormicrodiips. Probe arrays are usefiil m sequendng and diagnostic appUcations 
when the sequence and/or position on a soUd support of every probe of the array is 
known or is unknown. In either case, information about the target mideic add may be 
obtained and the target nudeic add detected, identified and sequenced as described in 
the methods described herein. Arrays comprise 4» different probes representing every 
member of the random sequence of length R, but arrays of less than 4-^ are also 

encompassed by the invention. 

Another embodiment of the invention is directed to method for creating 
probe arrays comprising the steps of synthesizing a first set of nudeic adds ead> 
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comprising a constant sequence of length C at a 3'-tenninus and a random sequence of 
length R at a 5'-tenninus. synthesizing a second set of nudeic adds ead. comprising a 
sequence complimentaxy to the constant sequence of ead. of the first mideic aad, and 
hybridizing the first set with the second set to create the array. Preferably, the nudeic 
5 adds of the first set are ead. between about 15-30 nudeotides in length and the nudeic 

adds of the second set are eadi between about 10-25 nudeoddes .in length. Also 
preferable is that C is between about 7-20 nudeotides and R is between about 3-10 

nudeotides. . 

Arraysinaycoinpriseabout4»differentprobes.butincertainapphcations. 

10 an entire air^ of every possible sequence is not necessary and incomplete arrays are 

acceptable for use. For example. ineonq,lete arrays may be utilized for screemng 
procedures of very rare target mideic adds vdiere nonspedfic hybridization is not 
ejected to be problematic Father, every member of an array may not be needed 
when detecting or sequendng smaller nudeic adds where the dumce of requirmg 

15 certain combinations of mideotides is so low as to be pnictically nonexistent Array 

v^di are fixed to soUd supports are expected to be most useful, although array m 
sohition also have many appUcations. SoUd supports whidi are useful indude plastics 
sudi as microliter plates, beads and microbeads. ceramics, metals where resiUence is 
desired or magnetic beads for ease of isolation, resins, gds, polymers and other fihns, 
20 membranes or diips sudi as the two- and three-dimensional sequendng diips utihzed 

in sequencing technology. 

Alternatively, probe arrays may also be made whidi are snigle^tranded. 
These arrays are created, preferably on a soUd support, basically as described, by 
synthesizing an array of nudeic adds cadi comprising a constant sequence of length C 
25 at a 3'-terminus and a random sequence of length R at a S-.termimis, and fixing the 

array to a first soUd support Arrays created in this mamier can be quiddy and easily 
transformed into double-stranded arrays by the synthesis and hybridization of a set of 
nudeic adds with a sequence complimentary to the constant sequence of the rephcated 
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array to create a double-stranded repUcated array. However, in the. present form, 
single.strandedaxraysareveryvaluableasten,,latesforrepUcationofthearray. 

Due to the very large nun*ers of probes which comprise most useful 
arrays, there isagreat deal of time spentinsin,,lycreatingthearray. It requires many 

5 hours of nucleic add s>™thesis to create each member of the array and many horn, of 

manipulations to place the array in an organized fashion onto any soUd support such as 
those describedpreviously. Once the master array is created. lepUcated arrays or slaves, 
can be quickly and easUy created by the methods of the im«ntion whidi take advantage 
of the speed and accuracy of middc add polymerases. Basically, methods for 

10 rephcating an array of single-stranded probes on a soUd support comprise the steps of 

synthesizing an array of nudeic adds eadi comprising a constant sequence of length C 
at a y-terminus and a random sequence of length R at a ^-terminus, fixing the array 
to a first soUd support, synthesizing a set of mideic adds cadi comprising a sequence 
complimentary to the constant sequence, hybridizing the mideic adds of the set with the 

15 array, enzymatically extending the mideic adds of the set using the random sequences 

of the array as templates, denaturing the set of extended mideic adds, and fixmg the 
denatured mideic adds of the set to a second soUd support to create the repUcated 

array of single-stranded probes. 

Denatuiation of the array can be performed by subjecting the array to 

20 heat, for example 90»-100«C for 2-15 minutes, or highly alkaline conditions, sudi as by 

the addition of sodmm hydroxide. Denaturation can also be accomplished by adding 
organic solvents, mideic add binding proteins or enzymes whidi promote denaturation 
to the array. Preferably, the solid supports are coated with a substance sudi as 
streptavidin and the mideic add reagents conjugated with biotin. Denaturation of the 

25 partial duplex leads to binding of the nudeic adds to the soUd support 

Another embodiment of the invention is directed to methods for creatmg 
arrays of probes comprismg the steps of synthesizing an array of single-stranded micldc 
adds ead. containing a constant sequence at the S'-terminus. another constant sequence 
at the 5'-terminus, and a random internal sequence of length R flanked by the deavage 
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site(s) of a restriction enzyme (on one or both sides), synthesizing an array of primers 
each compUementaiy to a portion of the constant sequence of the 3'-teiminiis, 
hybridizing the two arrays together to form hybrids, extending the sequence of each 
primer by polymerization using a sequence of the nucleic acid as a template, and 
cleaving the extended hybrids with the restriction en2yme to form an array of probes 
with a double-stranded portion at one terminus, a single-stranded portion containing the 
random sequence at the opposite terminus. Preferably, the mideic adds are each 
between about 10-50 nucleotides in length and R is between about 3-5 nucleotides in 
length. Any of the restriction enzymes vMch produce a 3*- or 5'-ovMhang after cleavage 
are suitable for use to make the array. Some of the restriction en^rmes which are useful 
in this regard, and their recognition sequences are depicted in Table 1. 

Table 1 



30 



Restriction 
Enzyme 
AhvNI 

Bbvl 

Bgll 

BstXJ 

Drain 

Fokl 

Hgal 

PflMI 



Rp^pinirion Sequence 
y-fWrtianp 3'-Overhang 

5*-CAG NNN^CTG 
3'-GTCtNNN GAC 

5'-GCAGC(N),* 
3*-CGT0G(N)„t 

5'-GCCN NNN*NGGC 
3*-CGGNtNNN NCOG 



S'-CCAN NNNNiNTGG 
3'-GGTNtNNNN NACXZ 



S'-CAC NNNiGTG 
3'-GTGtNNN CAC 

5'-GGATG(N),* 
3*-CCrAC(N)„t 

5*-GACGC(N)5* 
3'-CrGCX5(N),ot 



5'-CCAN NNNiNTGG 
3'-GGTNtNNN NACC 
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SfaNI y-GCATC(N)3* 

3'-CGTAG(N)^t 

Sfil 5'^GCCN NNNINGGCC 

3'-CCXjGNtNNN NCXX3G 

Also prefered is that the array be fixed to a solid support such as a plastic, 
ceramic; metal, resin, polymer, gel, film, membrane or chip. Fixation can be 
accomplished by conjugating the reagents for synthesis with a specific binding protein 
or other similar substance and coating the surface of the support with the binding 
coimterpart (e,g. biotin/streptavidin, Fyprotein A, nucleic add/nudeic add binding 
protein). 

Alternatively, another similar method for creating an array of probes 
conq>rising the steps of synthesizing an array of single-stranded nudeic adds each 
containing a constant sequence at the 3'-terminus, another constant sequence at the 5*- 
terminus, and a random internal sequence of length R flanked by the deavage site(s) 
of a restriction enzyme (on one or both sides), synthesizing an array of primers with a 
sequence complimentaiy to the constant sequence at the 3'-terminus, hybridizing the two 
arrays together to form hybrids, enzymatically extending the primers using the nudeic 
adds as ten^>lates to form full-length hybrids, doning the full-length hybrids into vectors 
such as plasmids or phage, doning the plasmids into conq>etent bacteria or phage, 
reisolattng the doned plasmid DNA, an^ifying the doned sequences by multiple 
polymerase diain reactions, and deaving the an9>lified sequences with the restriction 
en^me to form the arrs^ of probes with a double-stranded portion at one teraunus and 
a single-stranded portion containing the random sequence at the opposite terminus. 
Using this method the array of probes may have 5'- or 3*-overhangs depending on the 
deavage spedfidty of the restriction en^rme (e.g. Table 1). The array of probes may 
be fixed to a solid support such as a plastic, ceramic, metal, resin, polymer, film, gel, 
membranes and chip. Preferably, during PGR amplification, the reagent primers are 
conjugated with biotin which facilitates eventual binding to a streptavidin coated surface. 
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Another embodiment of the invention is directed to methods for using 
customized probes, arrays, and repUcated arrays, as described herein, in diagnostic aids 
to screen biological samples for specific nudeic add sequences. Diagnostic aids and 
methods for using diagnostic aids would be very useful when sequence information at 
5 a particular locus of, for exanqjlc, DNA is desired. Single mideotide mutations or more 

conqjlex nudeic add fingerprints can be identified and anatyzed quickly, effidently, and 
easily. Such an approadi would be immediately usefiil for the detection of individual 
and femify genetic variation, of inherited mutations such as those vMdx cause a disease, 
DNA dependent normal phenotypic variation, DNA dependent somatic variation, and 

10 the presence of heterologous imdeic add sequences. 

Especially useful are diagnostic aids conqirising probe arrays. These 
arrays can make the detection identification, and sequencing of nudeic adds from 
biological samples exceptionally rapid and allows one to obtain multiple pieces of 
information from a single sanq>le after performing a single test Methods for detecdng 

15 and/or identifying a tar^get nudeic add in a biological sample comprise the steps of 

creating an array of probes fixed to a solid su^ort as described herein, labeling the 
nudeic add of the biological saii9>le with a detectable label, Iqrbridizing the labeled 
nudeic add to the array and detecting the sequence of the rmdeic add from a binding 
pattern of the label on the array. These methods for creating probe arrays and for 

2(X rapidly and effidently replicating those arrays, such as for diagnostic aids, makes the 

manufacture and commercial application of large numbers of arrays a possibility. 

As described, these diagnostic aids are usefiil to humans, other animals, 
and even plants for the detection of infections due to viruses, bacteria, fungi or yeast, 
and for the detection of certain parasites. These detection methods and aids are also 

25 useful in the feed and food industries and in the envirormiental field for the detection, 

identification and sequencing of nudeic adds assodated with samples obtained from 
environmental sources and from manufacturing products and by-products. 

Diagnostic aids comprise specific nudeic add probes fixed to a solid 
support to which is added the biological sample. Hybridization of target nudeic adds 
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is detennined by adding a detectable label, such as a labeled antibody, which wdl 
spedficaUy recognize only hybridized targets or. alternatively, unhybridized target is 
washed off and labeled target specific antibodies are added. In either case, appearance 
of label on the solid support indicates the presence of nucleic add target hybridized to 

5 the probe and consequenfly, within the biological sample. 

Customized probes may also prove useful in prophylaxis or therapy by 
directing a drug, antigen, or other substance to a mideic add target with whidi it will 
hybridize. Hie substance to be targeted can be bound to the probe so as not to 
interfere with possible hybridization. For enmple. if the probe was targeted to a viral 

10 nudeic add target, an effective antiviral could be bound to the probe whidi will then 

be able to spedfically cany the antiviral to infeded cells. This would be espedaUy 
useful when the treatment is harmful to normal cells and predse targeting is required 
for efScacy. 

Another embodiment of the invention is directed to methods for creating 
15 a nudeic add probe comprising the steps of synthesizing a pluraHty of single-stranded 

first nudeic adds and an array of longer single-stranded second nudeic adds 
complimentary to the first nudeic add with a random terminal mideotide sequence, 
hybridizing the first nudeic adds to the second nudeic adds to form hybrids having a 
double^tranded portion and a single-stranded portion with the random nudeotide 
20 sequence within the single-stranded portion, hybridizing a single-stranded nudeic add 

target to the hybrids, ligating the hybridized target to the first nudeic add of the hybrid, 
hybridizing the ligated hybrid with an array of oKgomideotides with random nudeotide 
sequences, ligating the hybridized oligonndeotide to the second nudeic add of the 
ligated hybrid, isolating the second nudeic add, and hybridizing another first nudeic 
25 add with the isolated second nudeic add to form a nudeic add probe. Preferred is that 

the first nudeic add is about 15-25 nudeotides in length, that the second mideic add 
is about 20-30 nudeotides in length, that the constant portion contain an enzyme 
recognition site, and that the oUgonudeotides are each about 4-20 nucleotides in length. 
Probes may be fixed to a soUd support such as a plastic ceramic, a metal, a resin, a gel, 
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or a membrane. It is preferred that the soUd support be a two-dimensional or three- 
dimensional matrix with multiple probe binding sites such as a hybridization chip. 
Nucleic add probes created by the method of the present invention are useful in a 
diagnostic aid to screen a biological sample for genetic variations of nudeic add 

5 sequences therein. 

Another embodiment of the invention is directed to a method for creating 
a nudeic add probe comprising the st^ of (a) synthesiiang a pluraUty of single- 
stranded first nudeic adds and a set of longer single-stranded second nudeic adds 
conq>limentary to the first nudeic add with a random terminal nudeotide sequence, (b) 

10 hybridizing the first nudeic adds to the.second nudeic adds to form hybrids having a 

double-stranded portion and a single-stranded portion with the random nudeotide 
sequence in the single-stranded portion, (c) hybridizing a single-stranded nudeic add 
target to the hybrids, (d) ligating the hybridized target to the first nudeic add of the 
hybrid, (c) oizymaticalfy extending the second nudeic add using the target as a 

15 traiplate, (f) isolating the extended second nudeic add, and (g) bybxidizmg the first 

nudeic add of step (a) with the isolated second nudeic add to form a nudeic add 
probe. It is preferred that the first nudeic add is about 15-25 nudeotides in length, that 
the second nudeic add is about 20-30 nudeotides in length, and that the double- 
stranded portion contain an en^e recognition site. It is also preferred that the probe 

20 be fixed to a solid support, such as a plastic, ceramic, a metal, a resin, a gel, or a 

membrane. A preferred soUd support is a two-dimensional or three-dimensional matrix 
with multiple probe binding sites, such as a l^rbridization dap. A further embodiment 
of the present invention is a diagnostic aid comprising tiie created nudeic add probe 
and a method for using tiie diagnostic aid to screen a biological saiiq>le as herein 

25 described. 

As an extension of this procedure, it is also possible to use the methods 
herein described to determine tiie nudeotide sequence of one or more probes which 
hybridize witii an unknown target sequence. For example, Sanger dideoxynudeotide 
sequencing techniques could be used when enzymaricalfy extending the second nudeic 
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add using the target as a template and labeled substrate, extended products could be 
resolved by polyacryhunide gel electrophoresis, and the hybridized sequences of the 
probes easily read off the gel. llris aspect may be useful when it is cumbersome to 
determine the sequence of the entire target and only a smaller region of that sequence 
is of interest. 

The foUowing exanq>les illustrate embodiments of the inventton. but 
should not be viewed as limiting the scope of the invention. 



10 



Example 1 

M^n4 p.,latinn of r>MA in the soUd state. Complexes between streptavidm 
(or avidin) and biotm represent the standard way in which much soUd state DNA 
sequencing or other DNA manqnilation is done, and one of the standard ways in which 
non-radioactive detection of DNA is carried out Over the past few years streptavidin- 
biotin technology has expanded in several ways. Several years ago, the gene for 
15 streptavidin was doned and sequenced (CE. Argarana et al, Nuc. Adds Res. 14:1871. 

1986). More recently, using the Studier T7 system, over^ression of the Protein in E. 
coti was adiieved (T. Sano and CR. Cantor. Proc NatL Acad. Sd. USA 87:142. 1990). 
In the last year, mutant streptavidins modified for improved solubiUty properties and 
firmer attachment to soUd supports was also expressed (T. Sano and CR. Cantor, 
20 Bio/Tedmology 9:1378-81. 1993). The most relevant of these is core streptavidin, (fully 

active protein with extraneous N- and C-terminal peptides removed) with 5 cysteine 
residues attadied to the C-terminus. An active protein fiision of streptavidin to two IgG 
binding domains of staphylococcal A protein was also produced (T. Sano and CR. 
Cantor. Bio/Tedmology 9:1378-81. 1991). Tliis allowed biotinylated DNAs to be 
axtzchU to spedfic Immunoglobulin G molecules without the need for any covalent 
diemistiy. and it has led to the development of immuno-PCR. an exceedingly sensitive 
method for detecting antigens (T. Sano et aL. Sd. 258:120-29, 1992). 



25 



PCr/US93/10616 

WO 94/11530 



-29- 



A protein fusion between streptavidin and metaUothionein was recently 
onstructed (T. Sano et al. Proc NatL Acad, Sci. USA, 1992). Both partners in this 
protein fusion are fiiUy active and these streptavidin-biotin interactions are being used 
to develop new methods for purification of DNA, including triplex-mediated capture of 
5 duplex DNA on magnetic microbeads (T. Ito et al., Proc. Nad. Acad. ScL USA 89:495- 

98, 1992) and afBnity ca^)ture electrophoresis of DNA in agarose (T. Ito et al., G. A.T A., 
1992). 

An examination of the potential advantages of stackmg hybridization has 
been carried out by both calculations and pilot experiments. Some calculated T„*s for 

10 perfect and mismatched duplexes are shown in Figure 1. These are based on average 

base compositions. The calculations were preformed using the equations given by J.G. 
Wetmur (Grit Rev. in Biochem. and MoL Biol. 26:227-59, 1991), In the case of 
oligomideotidc stacking, these researchers assumed that the first diqjlex is fully formed 
under the conditions vrhcrc the second oligomer is being tested; in practice this may not 

15 always be the case. It will, however, be the case for the configuration shown in Figure 

1, The calculations reveal a number of interesting features about stacking Iqrbridization. 
Note that the binding of a second oligomer next to a pre-formed duplex provides an 
ttctra stabiliQr equal to about two base pairs. More interesting, still, is the fact that 
mispaiiing seems to have a larger consequence on stacking l^ridization than it does 

2& on ordinary hybridization. This is consistent with the very large effects seen by ¥JR. 

Khrapko et aL (J. DNA Sequencing and Mapping 1375-88, 1991) for certain ^es of 
mispairing. Other types of mispaiiing are less destabilizing, but these can be eliminated 
by requiring a ligation step. In standard SBH, a terminal mismatch is the least 
destabilizing event, and thus, leads to the greatest source of ambiguity or background. 

25 For an octanudeotide complex, an average terminal mismatch leads to a 6*C lowering 

in T„. For stacking hybridization, a terminal mismatch on the side away from the pre- 
existing duplex, is the least destabilizing event. For a pentamer, this leads to a drop in 
T of 10®C. These considerations indicate that the discrimination power of stacking 

on 

hybridization in favor of perfect duplexes might be greater than ordinary SBH. 



wo 94/11530 



PCr/US93/10616 



-30- 



Example 2 

Terminal se q uencing by positional hybridization . The basic sequencing 
by hybridization scheme is depicted in Figure 2. It is different from any other because 
it uses a duplex oligonucleotide array with J-ended single-stranded overhangs. The 
duplex portion of each DNA shown is constant Only the overhangs vary, and in 
principle an array of 4" probes is needed to represent all possible overhangs of length 
n. The advantage of such an array is that it provides enhanced sequence stringency in 
detecting the 5' terminal nucleotide of the target DNA because of base stacking between 
the preformed DNA duplex and the newly formed diq)lex. 

One variable is the length of the single-stranded overhang. The shorter 
the overhang, the smaller the array of probes potentially useable. Overhangs of five and 
six have been successfully enq>loyed. Tlie nature of the support surface to which the 
oligonucleotide is attached, the means of its attachment, and the length of the 
oligonucleotide duplex are also in^ortant variables. Initially one 5' end-biotinylated 
strand of the probe duplex is attached to a solid suiface. The technology is already well 
developed for the attachment of nudeic adds to solid supports, such as streptavidin- 
coated magnetic microbeads and membranes such as the thin gel system. 

Another variable is the nudeic add capadty of the immobilized spot of 
probe. This determines the detection sensitivity required and is also important where 
unlabeled DNA may be present that could t^bridize competitively with the desired 
labeled DNA product As depicted in Figure 2A, the 3* overhang of the array can 
detect the 3*-termin^ sequence of the target DNA. These will derive from 5*-end 
labeled restriction fragments of known DNA sequence cut from vectors so that the 
target for the immobilized probe will either be at the 3' end, just internal to it, or totally 
internal. In some subsequent examples, it does not matter whether hybridization is 
absolutely specific for the 3' end. 

Alternatively, positional sequencing by hybridization of the 5'-end single- 
stranded overhangs would be equally effective (Figure 2B). This permits reading of the 
5' terminal sequence of the target DNA. However, this approach is not as versatile 
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because it does not allow for the vise of polymerases to enhance the length and accuracy 
of the sequence read. 

Example 3 

Preparation of model arrays . Following the scheme shown in Figure 2, in 
a single synthesis, all 1024 possible single-stranded probes with a constant 18 base stalk 
followed by a variable 5 base extension can be created The 18 base extension is 
designed to contain two restriction enzyme cutting sites, /ifea I generates a 5 base, 5' 
overhang consisting of the variable bases Us* ^ot 1 generates a 4 base, 5* overhang at 
the constant end of the oligonucleotide. The synthetic 23-mer mixture will be hybridized 
with a conq)limentaxy 18-mer to form a duplex vMct can then be enzymatically 
extended to form all 1024, 23-mer duplexes. These can be doned by, for example, blunt 
end ligation, into a plasmid which lades Not I sites. Colonies containing the doned 23- 
base insert can be selected. Each should be a done of one unique sequence. DNA 
minipreps can be cut at the constant end of the stalk, filled in with biotinylated 
pyrimidines, then cut at the variable end of the stalk, to generate the 5 base 5' overhang. 
The resulting nudeic add can be fractionated by Qiagen cohinms (nudeic add 
purification columns) to discard the high molecular weight material, and the nudeic add 
probe wiU then be attached to a streptavidin-coated surface. This procedure could 
easily be automated in a Beckman Biomec or equivalent chemical robot to produce 
maiqr identical arrays of probes. 

The initial array contains about a thousand probes. The particular 
sequence at any location in the array will not be known. However, the array can be 
used for statistical evaluation of the signal to noise ratio and the sequence 
discrimination for different target molecules under different hybridization conditions. 
Hybridization with known nudeic add sequences allows for the identification of 
particular elements of the array. A suffident set of hybridizations would train the array 
for any subsequent sequencing task. Arrays are partially characterized imtil they have 
the desired properties. For example, the length of the oligonudeotide duplex, the mode 
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of its attachment to a surface, and the hybridization conditions used, can aU be varied, 
using the initial set of doned DNA probes. Once the sort of array that works best is 
determined, a complete and fiiUy characterized array can then be constructed by 
ordinaiy chemical synthesis. 

5 Exan^le 4 

Pr ^amrinn of sT '-^-fir prohe arravs. Hie major chaUenge for positional 
SBH, is to build real arrays of probes, and test the fraction of sequences that actuaUy 

perform according to expectations. Base composition and base sequence dependence 
on the effectiveness of hybridization is probably the greatest obstacle to successful 

10 implementation of these methods. The use of enzymatic steps, where feasible, may 

simplify these problems, since, after all. the enzymes do manage to work with a wide 
variety of DNA sequences in vivo. With positional SBH. one potential trick to 
conqMmsate for some variations in stabiUty would be to allow the adjacent duplex to 
vary. Thus, for an A+T rich overhang, one could use a G+C rich stacking duplex, and 

15 vice versa. 

Four methods for makmg arrays are tested and evaluated with two major 
objectives. The first is to produce, rapidly and inexpensively, arrays that will test some 
of the principles of positional SBH. Hie second is to develop effective methods for the 
automated preparation of full arrays needed for production sequencing via positional 

20 SBH. Since the first studies indicated that a five base overiiang will be sufBdent, arrays 

may only have to have 1024 members. The cost of making all of these compounds is 
acmally quite modest The constant portion of the probes can be made once, and then 
extended in parallel, by automated DNA synthesis methods. In the simplest case, this 
will require the addition of only 5 bases to each of 1024 compounds, which at typical 

25 chemical costs of $2 per base will amount to a total of about S10.000. 

Moderately dense arrays can be made using a typical x-y robot to spot the 
biotinylated compounds individuaUy onto a streptavidin-coated surface. Using such 
robots, it is possible to make arrays of 2 x 10* samples in 100 to 400 cm* of nominal 
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surface. T array should preferably fit in 10 cm\ but even if forced, for unforeseen 
technical reasons, to compromise on an array ten times or even 50 times less dense, it 
will be quite suitable for testing the principles of and many of the variations on 
positional SBH. Commercially available streptavidin-coated beads can be adhered, 
5 permanently to plastics like polysQrene, by exposing the plastic first to a brief treatment 

with an organic solvent like triethylamine. The resulting plastic sur£aces have 
enormously high biotin binding capadQr because of the very high surface area that 
results. This will suffice for radioactively labeled samples. 

For fiuorescentfy labeled samples, the background scattering from such a 

10 bead-inq>regnated sanq)le may interfere. In this case, a streptavidin*conjugated glass or 

plastic surface may be utilized (commercially available from Bios Products). Siuf aces 
are made using commercially available amine-containing surfaces and using 
commercially available biotin-containing N-t^dnn^sucdnimide esters to make stable 
peptide conjugates. The resulting surfaces will bind streptavidin, at one biotin binding 

15 site (or at most two, but not more because the approximate 222 symmetiy of the protein 

would preclude this), vMdi would leave other sites available for binding to biotiiQrlated 
oligonucleotides. 

In certain e9q)eriments, the need for attaching oligonucleotides to surfaces 
may be circumvented altogether, and oligonucleotides attached to streptavidin-coated 

20: magnetic microbeads used as alreacfy done in pilot experiments. The beads can be 

manipulated in microtitre plates. A magnetic separator suitable for such plates can be 
used including the newfy available confessed plates. For example, the 18 by 24 well 
plates (Genetix, Ltd.; USA Scientific Plastics) would allow containment of the entire 
array in 3 plates; this fonnate is well handled by existing chemical robots. It is 

25 preferable to use the more compressed 36 by 48 well formate, so that the entire array 

would fit on a single plate. The advantages of this approach for all the e)^riments are 
that any potential complexities from surface effects can be avoided, and already-existing 
Uquid handlin g, thermal control, and imaging niethods can be used for all the 
experiments. Thus, this allows the characterization of many of the features of positional 
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SBH before having to invest the time and effort in fabricating instruments, tools and 
chips. 

Lasdy, a rapid and highly efficient method to print arrays has been 
developed Master arrays are made which direct the preparation of repUcas, or 
5 appropriate complementary arrays. A master array is made manuaUy (or by a very 

accurate robot) by sampling a set of custom DNA sequences in the desired pattern and 
then transferring these sequences to the rcpKca. The master array is just a set of aU 
1024-4096 compounds. It is printed by multiple headed pipettes and compressed by 
ofibetting. A potentiaUy more elegant approach is shown in Figure 14. Amaster array 

10 is made and used to transfer conqjonents of the repUcas in a sequence-specific way. The 

sequences to be transferred are designed so that they contain the desired 5 or 6 base 
5* variable overhang adjacent to a unique 15 base DNA sequence. 

Hie master array consists of a set of streptavidin bead-impregnated plastic 
coated metal pins, each of which, at its tq), contains immobilized biotii^dated DNA 

15 strands that consist of the variable 5 or 6 base segment plus the constant 15 base 

segment Any unoccupied sites on this surface are filled with excess free biotin. To 
produce a replica chip, the master array is incubated with the complement of the 15 
base constant sequence, ^-labeled with biotin. Next, DNA pofymerase is used to 
synthesize the complement of the 5 or 6 base variable sequence. Then the wet pin array 

20 is touched to the streptavidin-coated sur&ce of the replica, held at a temperature above 

the T. of the complexes on the master array. If there is insufficient liquid carryover 
from the pin array for efficient sanq>le transfer, the replica array could first be coated 
with spaced droplets of sohrent (either held in concave cavities, or delivered by a 
multiheaded pipettor). After the transfer, the replica chip is incubated with the 

25 conq>lement of 15 base constant sequence to reform the double-stranded portions of the 

array. The basic advantage of this scheme, if it can be realized, is that the master array 
and transfer compounds are made only once, and then the manufacture of replica arrays 
should be able to proceed almost endlessly. 
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Example 5 

DNA ligation to oligomicleotide arrays . Following the schemes shown in 
Figures 3A and 3B, R coti and T4 DNA Ugases can be used to covalently attach 
hybridized target nudeic add to the correct inunobiUzed oligonudeotide probe. Hiis 
5 is a highly accurate and effident process. Because ligase absolutely requires a correcdy 

base paired 3' terminus, Ugase wiU read only the 3*-terminal sequence of the target 
nudeic add. After ligation, the resulting duplex wiD be 23 base pairs long and it wiU 
be possible to remove unlqrbridized, unligated target nudeic add using feirly stringent 
washing conditions. Appropriately chosen positive and negative controls demonstrate 

10 the power of this scheme, such as arrays which are lacking a 5'-tenninal phosphate 

adjacent to the 3* overhang since these probes will not ligate to the target nudeic add. 

There are a number of advantages to a ligation step. Physical spedfidty 
is supplanted by enzymatic spedfidty. Focusing on the 3* end of the target nudeic also 
ini'TiinnV/^ problems arising bom stable secondary structures in the target DNA. As 

15 shown in Figure 3B, ligation can be used to enhance the fidelity of detecting the 5'- 

terminal sequence of a target DNA. 

DNA ligases are also used to covalentiy attach Iqrbridized target DNA to 
the correct immobilized oMgonudeotide probe. Several tests of the feasibility of the 
ligation scheme shown in Figure 3. Biotiriylated probes were attached to streptavidin- 

20 coated magnetic microbeads, and annealed with a shorter, con^>lementary, constant 

sequence to produce duplexes with 5 or 6 base single-stranded overhangs. One set of 
actual sequences used is shown in Exan5)le 14. '^-end labeled targets were allowed to 
hybridize to the Probes. Free targets were removed by aq>turing the beads with a 
magnetic separator. DNA ligase was added and ligation was allowed to proceed at 

25 various salt concentrations. The samples were washed at room tenq>erature, again 

manq>ulating the irmnobilized compounds with a magnetic separator. This should 
remove non-ligated material Finally, sanq^les were incubated at a terrq>erature above 
the T„ of the duplexes, and eluted single strand was retained after the remainder of the 
samples were removed by magnetic separatioiu The eluate at this pomt should consist 
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of the ligated material. The fraction of Ugation was estimated as the amount of «P 
recovered in the high temperature wash versus the amount recovered in both the high 
and low temperature washes. Results obtained are shown in Figure 13. It is ^parent 
that salt conditions can be found where the legation proceeds efBciently with perfecdy 

5 matched 5 or 6 base overhangs, but not with G-T mismalches. 

The results of a more extensive set of similar ejcperiments are shown m 
Tables 2-4. Table 2 looks at the effect of the position of the mismatdi and Table 3 
examines the effect of base composition on the relative discrimination of perfect 
matches veises weakly destabilizing mismatches. These data demonstrate that (1) 

10 effective discrimination between perfect matches and single mismatches occurs with all 

five base overhangs tested; (2) there is litUe if any effect of base composition on the 
amount of ligation seen or the effectiveness of match/mismatch discrimination. Thus, 
the serious problems of dealing with base composition effects on stabiUty seen in 
ordinary SBH do not appeal to be a problem for positional SBH; and (3) the worst 

15 mismatch positionis. as expected, the one distal from the phosphodiester bond formed 

in the ligation reaction. However, aay mismatches that survive in this position will be 
elinunatd by a polymerase extension reaction, such as as described herein, provided that 
polymerase is used, like sequenase version 2, that has no 3*-endonnclease activity or 
terminal transferase activity; and (4) gel electrophoresis analysis has confirmed that the 

20 - putative ligation products seen in these tests are indeed the actual products synthesized. 

Table 2 

ligation Efficiency of Matdied and Mismatched Duplexes 
in 0.2 M NaO at 37«C 

(SEQ ID NO 1) 3'-TCG AGA ACC TTG GCT-S" 

25 Ligation Efficiency 

CTA CTA GGC TGC GTA GTC-5' (SEQ ID NO 2) 

S'-B- G AT GAT CXX3 ACG CAT CAG AGC TC 0.170 (SEQ ID NO 3) 

5'.B- GAT GAT CCG ACG CAT CAG AGC TT 0.006 (SEQ ID NO 4) 

S'-B- GAT GAT COG ACG CAT CAG AGC TA 0.006 (StQIDNOS) 

30 5'-B- GATGATCCG ACGCATCAG AGCCC 0.002 (SEQ ID NO 6) 

5'-B- GATGATCCG ACGCATCAG AGTTC 0.004 (SEQ ID NO 7) 

5'-B- GAT GAT CCG ACG CAT CAG AAC TC 0.001 (SEQ ID NO 8) 
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Table 3 

Ligation Effidency of Matched and Mismatched Duplexes 
in 0.2 M NaQ at 37'C and its Dependance on AT Content of the Overhang 

Qyerhane Sequences AT Content ligation EfSciencv 



5 


Match 
Mismatch 


GGCOC 
GGCCr 


0/5 


030 
0.03 




Match 
Mismatch 


AGOCC 

AGcrc 


1/5 


036 
0.02 


10 


Match 
Mismatch 


AGCTC 

AGcrr 


2/5 


0.17 
0.01 




Match 
Mismatch 


AGATC 
AGATT 


3/5 


0.24 
0.01 




Match 
Mismatch 


ATATC 
ATATT 


4/5 


0.17 
0.01 


15 


Match 
Mismatch 


ATATT 
ATATC 


5/5 


031 
0.02 
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Table 4 

Increasing Discrimination by Sequencing Extension at 37'C 

1 ; p,rinn Fflidenev 1 Jffation Eaension (cpm) 
(percent) (+) O 

(SEQ ID NO 1) y-TCG AGA ACX TTG GCT-5" 
CTACTAGGCTGCGTAGTC-S' (SEQTONOZ) 
S'-B- GATGATCCGACGCATCAGAGATC 024 4<«4 29400 

5..B. ^TGA??OGACGCATCAGAGCTr iLfil ^ m 

(SEQ ID NO 10) ^ ,42 xll8 

Discrimioatioii = 

(SEQ ID NO 1) 3».TCG AGA ACX: TTG GCT-5" 
CTACTAGGCTCCGTAGTC-y (SEQ ID NO 2) 
5..B- GAtSS-COGACGCATCAGATATC 0.17 12^0 25^ 

5'-B- S^GA?Sx3ACGCATCAGATATr ML ^ m 

(SEQ ID NO 12) til x51 x65 

The discrimination for the correct sequence is not as great with an 
external mismatch (which would be the most difficult case to discriminate) as witii an 
internal mismatch (Table 4). A mismatch right at tiie ligation point would presmnably 
offer die highest possible discrimination. In any event, tiie results shown are very 
promising. Already there is a level of discrimination with only 5 or 6 bases of overlap 
that is better than the discrimination seen in conventional SBH with 8 base overlaps. 
Allele-spedfic amplification by the ligase chain reaction also appears to be quite 
successful (F. Baranay et al., Proc. Nati. Acad. ScL USA 88:189-93. 1991). 

Example 6 

Pndtinn;.! seouen ^ti p hv hvhridiza f ^^^ ^th a nested set of DNA samples , 
nms far described arrays have been very ineffidentiy utilized because witii only a single 
target nucleic add, only a single probe will be detected. TTiis clearly wastes most of tiie 
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potential information intrinsically available from the array, A variation in the 
procedures will use the array much more efficiently. This is illustrated in Figure 6. 
Here, before hybridization to the probe array, the 5'-labeled (or unlabeled) target 
nucleic add is partially degraded with an enqrme such as exonudease HI. Digestion 
5 produces a large number of molecules with a range of diain lengths that share a 

common 5'-terminus, but have a variable 3'-terminus. This entire family of nudeic adds 
is then hybridized to the probe array. Assuming that the distribution of 3'-ends is 
sufiBdentty broad, the hybridization pattern should allow the sequence of the entire 
target to be read subject to any branch point ambiguities. If a single set of exonudease 

10 conditions fails to provide a broad enough, distribution, samples could be combined and 

prepared under several different conditions. 

There are at least three ways to make nested DNA deletions suitable for 
positional SBH. The easiest, but ultimate^ probably the least satisfactory, is to use 
exonudease like exonudease m, by analogy to nested deletion doning in ordinary 

15 sequencing (S. Henikof^ Gene 28551-58, 1984), The difficulty with these enzymes is 

that they may not produce an even enough yield of compounds to fully represent the 
sample of interest. One sees a pattern of regions in the sequence where the en:qme 
moves relatively rapidly, and others where it moves relatively slowly. Several 
commercially available en^mes can be examined by looking at the distribution of 

20 fragment lengths direcdy on ordinary polyacrylamide DNA sequencing gels. 

The second s^iproach to making nested samples is to use the ordinary 
Maxam-Gilbert sequencing chemistry. It is possible to ligate the 5'-phosphorylated 
fragments whidi result from these diemical degradations. Indeed this is the prindple 
use for ligation-mediated genomic DNA sequencing (G J, Pfiefer et aL, ScL 246:810-13, 

25 1989), Asymmetric PGR or linear amplification can be used to make the 

complementary, ligatable, nested strands. A side benefit of this ^proach is that one can 
pre-select whidi base to cleave afrer, and this provides additional information about the 
DNA sequences one is working with. 
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The third approach to makiiig nested samples is to use variants on 
plus/minus sequencing. For example, one can make a very even DNA sequencing 
ladder by using Sanger sequencing with a dideoxy-pppN terminator. This does not 
produce a ligatable end. However it can be replaced with a ligatable end, while still on 
5 the original template, by first removing the ddpppN with the 3' editing-exonudease 

activity of DNA polymerase I in the absence of the one particular base at the end. Note 
that this accomplishes two things for the price of one. Not only does it generate a 
ladder with a ligatable, end, because one can pre-determine the identhy of the base 
removed, it provides an additional nucleotide of DNA sequence information. One can 

10 use single color detection in four separate reactions, or ultimately, four color detection 

by mixing the results of four separate reactions prior to hybridization. If this approach 
is successfiil, it is amenable to more elaborate variations combining laddering and 
hybridizatioxL Note that each of these procedures combines some of the power of 
ladder sequendng with the parallel processing of SBH. 

15 In addition, there are alternative methods of preparing the desired 

samples, such as polymerization in the absence of limiting amounts of one of the 
substrate bases, such as for DNA, one of the four dNTPs. Standard Sanger or Maxam- 
Gflbert sequencing protocols caimot be used to generate the ladder of DNA fragments 
because these techniques fail to yield 3 -ligatable ends. In contrast, sequencing by the 

20 method of the present invention combines the techniques and advantages of the power 

of ladder sequencing with the parallel processing power of positional sequencing by 
t^bridization. 

Ligation ensures the fidelity of detection of the 3' terminal base of the 
target DNA. To ensure similar fidelity of detection at the 5* end of the duplex formed 
25 between the probe and the target, the probe-target duplex can be extended after ligation 

by one nucleotide using, for exanq)le, a labeled ddNTP (Figure 5). This has two major 
advantages. First, spcd&dty is increased because extension with the Klenow fragment 
of DNA polymerase requires a correcdy base paired 3*-primer terminus. Second, using 
labeled ddNTPs one at a time, or a mixture of all four labeled with four different colors 
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simultaneously, the identity of one additional nucleotide of the target nucleic aad can 
be determined as shown in Figure 5. THus. an array of only 1024 probes would actually 
have the sequencing power of an array of 4096 hexameis. in other words, a 
corresponding four-fold gain for ai^ length used. In addition, polymerases work well 
5 in soUd state sequencing methodologies quite analogous of the type proposed herein. 

Exanq>Ie 7 

p>taitiifiano5itinni.l iufcmnatioF i ^'-Tiiendnghv hybridization. Inherent 
in the detection of just the 3'-terminal sequence of the target nudeic add, is the 
possibility of obtaining information about the distance between the sequence hybridized 

10 and a known reference point Although that point could be arbitrary, the S^-end of the 

intact target was used. Hie desired distance is then just the length of the DNA 
fragment that has hybridized to a particular probe in the arr^. In prindple. there are 
twowaystodeterminethislength. One is to length fractionate (5* labeled) DNA before 
or after the hybridization, ligation, and ai^ DNA polymerase extension. Single DNA 

15 sequences could be used, but pools of many DNA targets used simultaneously or, 

alternatively, a double-labeled target with one color representing the S'-end of any 
unique site and the other a random internal label would be more efBdent For 
example, incorporated into the target is a fractional amount, for example, about 1%, of 
biotinylated (or digoxigeninrlabeled) pyrimidines, and use this later on for fluorescent 

20 detection. It has been recently shown that an internal label is effective in high 

sensitivity conventional ladder DNA sequendng. Hie ratio of the internal label to the 
end label is proportional to target fragment length. For aiqr particular sample the 
relationship is monotonic even though it may be irregular. Hius. correct order is always 
obtained even if distances are occasionally distorted by extreme runs of purines of 

25 pyrimidines. If necessary, it is also possible to use two quasi-independent internal 

labeling schemes. 

Hie scheme as just outlined, used with polymerase extension, might 
require as many as 6 different colored labels; 2 on the target (5' and internal) and four 
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on the probe extension (four ddNTPs). However the 5' label is unnecessary, since the 
3- extension provides the same information (providing that the DNA polymerase 
reaction is dose to stoichiometric). The ddNTPs can be used one at a time if necessary, 
nierefore. the scheme could proceed with as Uttie as two color detection, if necessary 
5 (Figure 7), and three colors would certainly suffice. 

A scheme complimentary to that shown in Figure 7 would retain positional 
information while reading the 5'-terminal sequence of 3'-end labeled ptas internally 
labeled target nucleic adds. Here, as in Figure 3B, probe arrays with 5* overhangs are 
used, however, polymerase extension will not be possible. 

10 Example 8 

Wesnlnrion of hrandi noint ambiguities. In current SBH, branch point 
ambiguities caused by sequence recurrences effectivefy limit the size of the target DNA 
to a few huiidred base pairs. Tba positional infiranation described in Section 6 will 
resohre many of these ambiguities. When a sequence recurrence occurs, if a complete 

15 DNA ladder is used as the sample, two or more targets wiU Iqrbridize to the same probe. 

Single nucleotide additions wiU be informative in 3/4 of the cases «^ere two targets are 
ligated to the same probe; th^ will reveal that a given probe contains two different 
targets and will indicate the sequence of one base outside the recurrence. The easiest 
way to position the two recurrent sequoices is to eliminate the longer or shorter 

20 members of the DNA ladder and Iqfbridize remaining spedes to the probe array. This 

is a sufBdently powerful approadi that it is likely to be a routine feature of positional 
SBH. Recurrences will be very frequent with onty 5 or 6 base overhangs, but the use 
of segmented ladders will allow most of these to be resolved in a straightforward way. 
It should not be necessary to physically fractionate the DNA spedes of the ladder 

25 (although this could certainty be done if needed). Instead, one can cut an end-labeled 

ladder with a restriction nudease. For an effective strategy seven 4-base specific 
en^mes should be used, singly or in combination. 
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Additional information is available for the recurrence of pentanucleotide 
sequences by the use of polymerase and single base extension as described in Example 
7. In three cases out of four the single additional base will be different for the two 
recurrent sequences. Thus, it will be dear that a recurrence has occurred. 

The real power of the positional information comes, not from its 
application to the recurrent sequences, but to its applications to surroxmding unique 
sequences. Their order will be determined unequivocally, assiimine even moderately 
accurate position information, and thus, the effect of the brandi point will be 
eliminated. For example, 10% accuracy in intensity rations for a dual labeled 200 base 
pair target will provide a positional accuracy of 20 base pair. This would presumably 
be sufficient to resolve all but the most extraordinary recurrences. 

Branch point ambiguities are caused by sequence recurrence and 
effectively limit the size of the target nucleic add to a few hundred base pairs. 
However, positional information derived from Exanq>le 7 will resolve almost all of these 
ambiguities. If a sequence recurs, more than one target fragment will l^ridize to, or 
otherwise be detected by subsequent ligation to or extension from a single immobilized 
probe. The apparent position of the target will be its average on the recurrent 
sequence. For a sequence which occurs just twice, the true location is symmetric around 
the apparent one. For example, the apparent position of a recurrent sequence occurring 
in positions SO and 100 bases from the S'-end of the target will be 75 bases from the 
end. However, when the pattern of positional sequencing by iqrbridization is examined, 
a sequence putatively located at that position will show overlap with contacts in the 
neighborhood of 50 bases and 100 bases from the 5*-end. This will indicate that a 
repeat has occurred. 

Example 9 

PYtenritTip the 3*-sequence of the target Using the scheme shown in 
Figure 8, it is possible to learn the identity of the base 3* to the known sequence of the 
target, as revealed by its hybridization position on an oligonucleotide array. For 
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example, an array of 4° single-stranded overhangs of the type NAGCTA 3\ as shown in 
the Figure, are created wherem n is the number of known bases in an overhang of 
length n+ 1. The target is prepared by using a 5' label in the manner shown in Figure 
3. The Klenow fragment of DNA polymerase would then be used to add a single 
dpppNp as a polymerization chain terminator (or alternatively, ddpppN terminators plus 
ligatable ends). Before hybridization the resulting 3'-terminal phosphate would be 
removed by fllkaiinft phosphatase. Tliis would allow subsequent ligation of the target 
to the probe array. Ether by four successive single color 5* labels, or a mixture of four 
different colored chains, each color corresponding to a particular chain terminator, one 
would be able to infer the identity of the base that had paired with the N next to the 
sequence AGCTA Labeling of the 5' end minimizes interference of fluorescent base 
derivatives on the ligation step. Presumably, provided with a suppfy of dpppNp, or ribo- 
pppNp vMch can be easity prepared, the sequenase version 2 or another known 
polymerase will use these as a substrate. The k^ step in this scheme is to add a single 
dpppNp as a polymerization chain terminator. Before l^bridization, the resulting 3* 
terminal phosphate is removed by alkaline phosphatase. This allows for the subsequent 
ligation of the target to the probe array. Alternatively, ddpppNp terminators replaced 
with ligatable ends may also be used. Either by four successive single color 5' labels, 
or a mixture of four different colored chains, each color representing a specific chain 
terminator, one is able to iof er the identity of the base that had paired with the N next 
to the sequence AGCTA The 5' end is labeled to minimize interference of fluorescent- 
based derivatives with the ligation step. 

Assuming that there are sufficient colors in a polychromatic detection 
scheme, this 3' target extension can be combined with the 3' probe extension to read 
n+2 bases in an array of complexity 4", This is potentially quite a substantial 
improvement It decreases the size of the array needed by a factor of 16 without any 
loss in sequencing power. However, the number of colors required begins to become 
somewhat daunting. In principle one would want at least nine, four for each 3' 
extension and one general internal label for target length. However, with resonance 
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ionization spectroscopy (RIS) detection, eight colors are available with just a single type 
of metal atom, and many more could be had with just two metals. 

Example 10 

Extending the S* sequence of the target In exsanplc 5, it was iUustrated 
that by polymerase extension of the 3'-end of the probe, a single additional nucleotide 
on the target could be determined after ligation. Tliat procedure used only chain 
terminators. Florescent labeled dNTPs that serve as substrates for DNA polymerase 
and other enzymes of DNA metabolism can also be made. Hie probe-target conQ)lex 
of each ligation reaction with, for exanqilc, three labeled dNTPs and a fourth unlabeled 
chain terminator could be extended using fluorescent labeled dNTPs. This could be 
repeated, successively, with each possible drain terminator. If the ratio of the intensities 
of the different labels can be measured £airiy accuratefy, a considerable amount of 
additional sequence information will be obtained. U the absolute intensities could be 
measured, the power of the method appears to be very substantial since one is in 
essence doing a bit of four color DNA sequencing at each site on the oligonucleotide 
array. For example, as shown in Figure 9, for the sequence (Pu)4T, such an approach 
would unambiguously reveal 12 out of the 16 possible sequences and the remainder 
would be divided into two ambiguous pairs each. Alternatively, once the probe array 
has captured target DNAs, full plus-minus DNA sequencing reactions could be carried 
out on all targets. Single nucleotide DNA addition methods have been described that 
would also be suitable for such a highly parallelized inqilementation. 

Example 11 

Sample pn nlinp ; in positional sequencing bv hybridization. A typical 200 
base pair target will detect onfy 196 probes on a five base 1024 probe array. This is not 
far from the ideal case in single, monochromatic sampling where one mi^t like to 
detect half the probes each time. However, as the procedure is not restricted to single 
colors, the array is not necessarily this small. With an octanudeotide array, in 
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conventional positional sequencing by hybridization or one of its herein described 
enhancements, the taiget detects only 1/32 of the immobilized probes. To increase 
efficiency a mixture of 16 taigets can be used with two enhancements. First. inteUigently 
constructed orthogonal pools of probes can be used for mapping by hybridization. 
5 Hybridization sequencing with these pools would be straightforward. Pook of targets, 

pools of probes, or pools of both can be used. 

Second, in the ana^ by conventional sequencing by hybridization of an 
array of 2 X 10* probes, divided into as few as 24 pools containing 8 x 10> probes each, 
there is a great deal of redundancy. Exchiding brandi points, 24 hybridizations could 

10 determine aU the nudcic add sequences of all the taigets. However, using RIS 

detection there are mudi more than 24 colors. Tlierefore, all the hybridizations plus 
appropriate controls could be done smniltaneously. provided that the density of the 
middc add san^le were high enough to keep target concentration far in excess of all 
the probes. A single hybridization experiment could produce 4 x 10« base pairs of 

15 sequence information. An effident laboratory could perform 25 such hybridizations in 

a day, resulting in a throughput of 10» base-pairs of sequence per day. This is 
comparable to the speed of potymerization by K coU DNA polymerase. 

Example 12 

Oligonndeotide ligation aff -^ TUT'I Stacking hybridization 

20 without ligation has been demonstrated in a simple format Eight-mer oligonudeotidw 

were annealed to a target and then annealed to an adjacent 5-mer to extend the 
readable sequence from 8 to 13 bases. This is done with small pools of 5-mers 
specifically chosen to resolve ambiguities in sequence data that has already been 
determined by ordinary sequencing by hybridization using 8-mers alone. The method 
25 appears to work quite well, but it is cumbersome because a custom pool of 5-mers must 

be created to deal with each particular situation. In contrast, the approach taken herein 
(Figure 9), after ligation of the target to the probe, is to Ugate a mixtures of 5-mers 
arranged in polychromaticaUy labeled orthogonal pools. For example, using 5-mers of 
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the form p ATGCAp or pATGCddA, only a single Kgarion event wiU occur with each 
probe-target complex. Tliese would be 3' labeled to avoid interference with the ligase. 
Only ten pools are required for a binary sieve analysis of 5-meis. In reaUty it would 
make sense to use maiv more, say 16. to introduce redundancy. K only four colors are 
5 available, those would require four successive hybridizations. For example, sixteen 

colors wouldallowasingle hybridization. But the result of this scheme is that one reads 
ten bases per site in the array, equivalent to the use of 4" probes, but one only has to 
make 2 X 4' probes. The gain in eflBdemy in this sdieme is a factor of 500 over 
conventional sequencing by hybridization. 

10 Example 13 

Svntheris of cuft»"i arrays of tMobes. Custom anays of probe would be 
useful to detect a change in nudeic add sequence, sudi as aiqr single base diange in a 
pre-selected large population of sequences. This is in^ortant for detecting mutations, 
for comparative sequendng. and for finding new, potentialty rare polymorphisms. One 

15 set of target sequences can be customized to an initial general array of nudeic add 

probes to turn the probe into a spedfic detector for any alterations of a particular 
sequence or series of sequences. TTie initial experiment is the same as outlined above 
in Example 4, except that the 3--blodced5-mers are unlabeled. After the ligation, the 
initial nudeic add target strand along with its attadied 18 nudeotide stalk is removed, 

20 and a new unligated 18 nudeotide stalk annealed to eadi element of the immobilized 

array (Figure 11). Hie difference is that because of its history, many (ideally 50% or 
more), of the elements of that array now have 10 base 3' extensions instead of 5 base 
extensions. These do not represent all 4>« possftle 10-mets, but instead represent just 
those 10-mers whidi were present in the original sample. A comparison sample can 

25 now be hybridized to the new array under conditions that detect single mismatches in 

a decanudeotide duplex. Any samples whidi fail to hybridize are suspects for altered 
bases. 
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A problem in large scale diagnostic DNA sequencing is handling large 
numbers of samples from patients. Using the approach just outlined, a third or a fourth 
cycle of oligonucleotide Ugation could be accomplished creating an array of 20.mers 
specific for the target sample. Such arrays would be capable of picking up unique 
segments of genomic DNA in a sequence specific fashion and detecting any differences 
in them in sample comparisons. Each array could be custom designed for one 
individual, without any DNA sequence determination and without any new 
oUgonudeotide synthesis. Any subsequent changes in tiiat individual's DNA such as 
caused by oncogenesis or environmental insult, might be easily detectable. 

Example 14 

Positional 5eotf —"rW>^''"'^'^*^"" Hybridizationwas performed using 
probes with five and six base pair overhangs, including a five base pair match, a five 
base pair mismatch, a six base pair match, and a six base pair mismatch. These 
sequences are depicted in Table 5. 

15 Table 5 

Test Sequences: 

5 bp ovBiiui, perfect match: 

J, y.TCGAGAACCTTGGCr-y (SEQ ID NO 1) 

y-CTACTAGGCTGCGTACJTC Sm!Jo?i 
20 y-Wotm-GATGATCCGACGCATCAGAGCTC-y (SEQ ID NO 3) 

5 bo oveiiao. mismatdi at 3* end: 

«»P o^™?- ^ TTO GCT-.y (SEQ ID NO 1) 

y-CTACTAGGCTGCGTAGTC $^SIIn2 
y-biodii-GATGATCCGACGCATCAGAGCTT-y (SEQ ID NO 4) 

25 '^^^'^''••^"^""^AGAACCTrGGCr.-y (SEQIDNOl) 

y-CTACTAGGCTGCGTAGTC ,^mNO°«! 
y-biotm-GATGATCXXJACGCATCAGAGCTCr.y (SEQ ID NO 13) 

6 bp overlap, mismatdi four Ifeses from y end: ,\ 
,n D p y-TCGAGAACCTTGGCf-y (SEQIDNOl) 

y^TACTAGGCTGCGTAGTC 4^roNO°4 
y-biotin-GAT GAT CXXJ AOG CAT CAG ACT TCT-y (SEQ ID NO 14) 
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Th^ hmtinyl o tArf Hnnhli*-QtrflnHf>d pmhe WAS nrp.nafffd iTi TF, hiifff i r hv 

annealing the complimentaiy single strands together at 68'C for five minutes followed 
by slow cooling to room ten^>erature. A five-fold excess of monodisperse, polystyrene- 
coated magnetic beads (Dynal) coated with streptavidin was added to the double- 
5 stranded probe, whidi as then incubated with agitation at room tenqierature for 30 

minutes. After ligation, the sanq>les were subjected to two cold (4*C) washes followed 
by one hot (90*C) wash in TE buffer (Figure 12), TTie ratio of in the hot 
supernatant to the total amount of ^ was determined (Figure 13). At high NaQ 
concentrations, mismatched target sequences were either not annealed or were removed 
10 in the cold washes. Under the same conditions, the matched target sequences were 

annealed and ligated to the probe. The final hot wash removed the non-biotinylated 
probe oligonudeotide. This oligonucleotide contained the labeled target if the target 
had been ligated to the probe. 

Exanq>le IS 

15 Compensating for variations in base composition. A major problem in all 

suggested implementations of SBH is the rather marked dependence of T„ on base 
composition, and, at least in some cases, on base sequence. The use of unusual salts 
like 

tetramethyl ammonium halides or betaines (W.A. Rees ct aL, Biochemistiy 32:137-44, 
20 1993) offers one zpptosidi to minimizing these varieties. Alternatively, base analogs like 

2,6-diamino purine and 5-bromo U can be used instead of A and T, respectively to 
increase the stability of A-T base paris, and derivatives like T-deazaG can be used to 
decrease the stability of G-C base pairs. The initial e3q)eriments shown in Table 2 
indicate that the use of enzymes will eliminate many of the complications due to base 
25 sequences. This gives the approach a very significant advantage over non-enzymatic 

methods which require different conditions for each nucleic add and are highly matched 
to GC content. 
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Another method to compensate for differences in stability is to vaiy the 
base next to the stacking site. Eq)eriments are perforaied to test the relative effects of 
aU fom^ bases in this position on overaU hybridization discrimination and also on relative 
Ugation discrimination. Base analogs such as dU (deoxyuridine) and 7-deazaG are also 
5 tested as components of the target DN A to see if these can suppress effects of secondary 

structure. Single-stranded binding proteins may also be helpful in this regard 

Example 16 

Data measurement pr npfLcdnp a nd interpretation. Highly automated 
methods for raw data handhng and generation of contiguous DNA sequence from the 

10 Itybridization are required for anafysis of the data. Two methods of data acquisition 

have been used in prior SBH efforts, CCD cameras with fluorescent labels and image 
plate analyzers with radiolabeled sanq)les. The latter method has the advantage that 
there is no problem with uniform sanqihng of the array. However it is effectively 
limited to onty two color ana^^ of DNA samples, by the use of and **P, 

15 differentially imaged through copper foil. In contrast, while CCD cameras are less well 

developed, the detection of many colors is possible by the use of appropriate exciting 
sources and filters. Four colors are available with conventional fluorescent DNA 
sequencing primers or terminators. More than four colors may be achievable if infra- 
red dyes are used. However, providing iiniform excitation of the fluorescem array is not 

20 a trivial problem. Both detection schemes are used and the image plate analyzers are 

sure to work. The CCD camera approach will be necessary if some of the multicolor 
labeling schemes described in the proposal are ever to be realized. Label will 
introduced into targets by standard enqrmatic methods, such as the use of S' labeled 
PGR primers, for 5' labeling, internally alpha T labeled triphosphates or fluorescent- 

25 labeled base analogs for internal labeling, and similar compounds by filling in staggered 

DNA ends for 3* labeling. 

Both the Molecular Dynamics image plate analyzer and the Photometries 
cooled CCD camera can deal with the same TIFF 8 bit data formate. Thus, software 
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developed for either instrument can be used to handle data measured on both 
instruments. This wiU save a great deal of unnecessary duplication in data processing 
software. Sequence interpretation software can be developed for reading sequencing 
chip data and assembling it into contiguous sequence are already underway in Moscow, 
at Argonne National Laboratory, and in the private sector. Such software is generally 
available in the interested user community. The most useful examples of this software 
can be customized to fit the particularly special needs of this approadi including 
polychromatic detection, incorporation of positional information, and pooling schemes. 
Specific software developments for constructing and decoding the orthogonal pools of 
samples that may ultimately be used are being developed because these procedures are 
also needed for enhanced physical ms^ing methods. 



Exanq>le 17 

rteneration ^yflH^, Hie general procedure for the generation 

of master beads is depicted in Figure 14. Forty microUters of Dynabeads M-280 
15 streptavidin were washed twice with 8O/1I of TE (bead concentration of 5mg/ml). Final 

concentration of beads was about 5-lOpmoles of biotinylated oligo for 40/ig of beads in 
a total volume of 80^. Each test oligo, in the form 5'-biotin-Ni Nj N, N4 Nj-lObp-3', 
was dissolved in TE to a concentration of 10pmol/40^1(25QnM). Eghty microUters of 
oligo were added and the mixture shaken gentty for 15 minutes in a vortex at low speed. 



Table 6 

Stock solutiops nf MPROB F.N fa 1ml TE pH 73 
MFROBEA 94/tg 12,200pmol 20^tl in 1ml 

MPROBEC nifig ISJ^Mpmol 16^*1 in 1ml 

MPROBEG 94/tg 1230(^niol 20ftl in 1ml 

MFROBET 147/tg 19,200pmol 13^1 in 1ml 



Stock solution nf MCOMPP TO in Sml TE pH 7.5 
MCOMPBIO 464,000pmol Sfd in 1.85nil 
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Tubes were placed in the Dynal MFC apparatus and the supernatant 
removed. Unbound streptavidin sites were sealed with 5^1 of 200mM free biotin in 
water. Wash the beads several times with 80^1 TE. ITiese beads can store in this state 

at 4"C for several weeks. 

5 250nM of S'-biotinylated 18 base nucleic add (the complement of the 

constant region) served as primer for enzymatic extension of the probe regioa Hie tube 
was heated to 68«C and allowed to cool to room temperature. Beads were kept in 
suspension by tqiping gently. Supematam was removed and washed with 40^1 TE 
several times. The tube was removed from the magnet and the beads resuspended in 

10 40^ of TE to remove excess complement TTie bead suspension was equally divided 

among 4 tobes and the stock tube washed with the wash divided among the tubes as 
well. Supernatant was removed and washed with water. .Each tube contained about 2-5 
pmol of DNA (28-72ng; see Table 6). 

Polymerase I extension was performed on each tube of DNA in a total of 

15 i3^1asfoUows(seeTable7): NEB buffer concentration was lOmM Tris-HO, pH 7.5, 

5mM MgO^ 7.5mM EmT; 33/»M d(N-Ni)TP nmq 2mM + «P dN,TP complimentary to 
one of the N, bases; and polymerase I large fragment (klenow). In the first weU was 
added dTTP, dCIP and dGTP. to a concentration of SSftM. "P-dATP was added to a 
concentration of S^M. dNTP stock solutions of 200mM were pooled to lack the labeUed 

20' nucleotide (Le. Tube A contains QG and T) adding 63^1 dNTP. 5^1 200/*M dNTP, and 

43;a water. Radioactively labeled CdNTP) stock sohitions were 20,M prepared from 
2^1 [a^Pl dNTP, Sfil 200^M dNTP, and 43/a water. 
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Table 7 



TLFBE* 


A 


C 


G 


T 


10 X buffer 


13/a 


13/11 


13^1 


13^1 


dATP 




2.V 


2.1/a 


2Atd 


dCIP 


2.1fJ 




2.1/a 


2.1^1 


dGTP 




2.1f(l 






dTTP 


2.1^1 


2.1^1 


2.1^1 




Enzyme 










of stock 


5U 


5U 


5U 


5U 








L9/a 


1.9/a 



The tubes were inaibated at 25*C for 15 minutes. To optimize the yields 
of enzymatic extension, higher concentrations of dNTPs and longer reaction time may 
be required. Hie reaction was stopped by adding 4^1 of 50mM EDTA to a final 
concentration of ll^M. The supernatant was removed and the beads rinsed with 40^1 
of TE buffer several times and resuspended in 35^ of TE Hie whole tube was counted 
and it was e}q>ccted that there would be about 8% incorporation of the label added. 

As a test of the synthesized oligo transfer, magnetic beads were suspended 
in SOfd of O.IM NaOH and incubated at room tenq>erature for 10 minutes. The 
supernatant from each tube was removed and transfer to fresh tube. Beads were 
incubated a second time with 50/d of O.IM NaOH. As many counts seemed to remain, 
the first set of beads were heated to GS^'C in 50^1 NaOH which leached out a lot more 
counts. Each base was neutralized with IM HQ foUowed by SO^il of TE. Fresh 
Dynabeads were added to the melted strand and incubated at room temp for 15 minutes 
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with gentle shaking. Supematants were removed and saved for counting. Tlie beads 
were washed several times with TE. Results are shown in Table 8. 

Table 8 

Tn^r pnratinn of label fMPRQBEC 5'-CATGG— ) 

A 28,711 / 779,480 

C 35,193 / 574,760 

G 15335 / 754,400 

T 43,048 / 799,440 



10 A 
C 
G 
T 



Transfexred Non bound Unmelted Rfficiency 

9,812 2330 10,419 43.4% 

13,158 3.950 8,494 51.4% 

5,621 2,672 1.924 55.0% 

15.898 5,287 5,942 58.6% 



Transferred refers to synthesized strand captured on fresh beads. 
15 Unbound refers to the synthesized strand that was not captured by the bead and 

unmelted refers to counts remaining on the original beads. As can be observed, 
between about 43% and 58% of the newly synthesized strands were successfuUy 
transferred indicating that an array of such strands could be successfiilly repUcated. 

Example 18 

20 A procedure f'^ i^alrinfr cn mplav arrays bv PGR. A slightly complex, but 

considerably improved scheme to test the generality of the new approach to SBH, 
without the need to synthesize, seprately. all 1024 five-mer probes has been developed. 
This procedure allows one to generate arrays widi 5'- and/or 3'-overhangs and uses PGR 
to prepare the final probes used for hybridization which may easily be labeled with 
25 biotin. It also builds in a way of learning part or even aU of the identity of each probe 

sequence. 
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Chemical qmthesis was used to make the following sequences: 

(a) y^TCGACAGTISACGCrACCAYNNNhniTGGTCTAGAGCTAGC-3' (SEQ ID NO 15) 

(b) ^-fTYy.AnAfiT TGACGCT ACCAIWNNimSgKn-AGAaXX^ (SEQ ID NO 16) 

Next, enzymatic extension of the apropriate primers using a DNA 
polymerase in the presence of high concentrations of dNTPs was used to make the 
complementary diqilexes. In the above sequences, N represents an equimolar mixture 
ofaU4bases;Risan equimolar mixture ofAandG;andYisan equimolar mixture 
of T and C. The underlined sequences ait Bst XI and Hga I recognition sites. 

5.-GlXX>ACAGTISAeSCrA£CAYNNNNRT^AG^ 



(a) 

(a) 
(b) 

(b) 




4 priip'r*' 




The seqences were designed with these internal Bst ^-cutting site which 
allows for the generation of complementary, 4 base 3'-overhanging single-strands which 
can be coverted to 5 base 3'-overhangs (see below) used for the type of positional SBH 
shown in Figure 2A. 

(SEQ ID NO 21) 5'-CCANNNNNNTGG-3' BstXI S'-OG^^WNNN^^ 
(sS ID NO 22) 3'-GGTNNNNNNACC5' 3'-GGTN NNNNNACC-5' 

The Hga /-cutting site overlaps with the Bst JJ-cutting site and allows for 
the generation of 5 base 5'-overhanging single-strands. This is the structure needed for 
the type of postional SBH shown in Figure 2B, and can also be used for subsequent 
sequencing of the overhangs by primer extension. 

(SEOIDN023) S'-GACGCNNNNNNNNNN-y ifga/ 5•-GACGC^J^^^NlW^W-3; 
SslSroN024) 3'.CIX3CGNNNNNNNNNN-5' 3'.CrGCGNNNNNNNNNN-5' 
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The 5'- and 3'-terminal sequences of strand (a) are also recognition sites 
for Sal I and Nhe I, respectively; the coiresponding sequence in strand (b) are 
recognition sites for Xho I and Xma /, respectively: 

5'.GTCGAC-3' Sail TCGAC-y 
5 3'-CAGCK3-5' - S'-CAGCT G-5 

S'-GCTAGC-S* Nbeiy-G CTAGC-S' 
y-CGATCG-y - 3'-CGATC G-S- 

5'<TO3AG-3' Xhol^-C TCGAG-S' 

y-GAGcrc-s' - y-GAGcr c-s 

S'-CCCGGG-3' Xmal 5*-C CXXK5G-3' 
10 \:^SS>5^ - 3--GGGCC C-S- 

TTiose doning sites are chosen such that, even with the degeneracy allowed 

by the sequences 5'- YNNNNR-S' and S'-RNNNNY-y. these enzymes will not cleave 
the probe regions. For doning. duplexes (a) were deaved with both So/ / and / 
restriction enzymes (or duplexes (b) with ^/ and Ama/. Tte resulting digesuon 
products were direcdonally doned into an appropriate vector (e*. plasmid. phage, etc), 
suitable cells were transfoxned with the vector, and colonies plated. Individual clones 
were pidced and their DNA amplified by PGR using vector sequences downstream and 
upstream from the doned sequences as the primers. Ibis was done to increase the 
length of the PGR products to ease the man^ulation of these products. Tbe probe 
regions from individual dones were amplified by PCai with one biotia^lated primer 
corresponding to the 5'-bases of the bottom strand. In a separate PGR. the Icoations 
of the biotins were reversed. Tbe resulting PGR products in eadi case were deaved 
vrfthBstXlandthebiotin-labeledproductscapmredonstreptavindinbeadsorsmfe^^^ 

Note that by using PGR an^lification instead of DNA purification, the need to 
separately purify and biotinylate eadi done is also eliminated. 

In paraUel, all the PGR products were deaved by Hga I whidi generates 
S'-overhangs consisting of randomized sequences, m identity of eadi done can then 



15 



20 



25 
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be detennincd by separate primer extensions of each of the two DNA pieces resulting 
from Hga I cleavage. For each pair of sequences, which derive from the same done, 
the overhangs must be complementary. Tlierefore, sequencing just three bases on each 
fragment strand will given the entire structure of two probes. This plus/minus 
sequencing can be done in microtire plates and is easily automated. It will fail only in 
the few cases were S^-RNNNNY-y in strand (b) contains 5'-GAOGC-y, vMch is the 
recognition site for Hga L The number of prier extension reactions required can be 
reduced by synthesis of more restricted pools of sequences. For exanqile, using 4 pools 
where the base in one particular postion is known in advance, such as 5*-YNNANR-3'. 

To make the probes needed for positional SBH (as sown in Figure 2A), 
the duplex PGR products are first attached to a solid support through streptavidin. They 
are then cleaved with Bst XI to generate the following pairs of products: 

y-B-GTOGACAGTrGACGCTAOCAYNNNN-S' (SEQ ID NO 25) 

3'- CAGCraTCAACrGCGATGGTR-5' (SEQ n> NO 26) 

5^B43CrAGCrCTAGACCAYNNNN-3' (SEQ ID NO 27) 

y- CGATCGAGATCTGGTR.S' (SEQ ID NO 28) 

5'-B<rimAGAGTrGACXKn'A(X:ARhfNNN-3' (SEQ ID NO 29) 

y- GA<Knxnx::AACrGOGATGGTY-y (SEQ ID NO 30) 

5*-B-CXXXjGGTCTAGACX:ARNNNN-3' (SEQ ID NO 31) 

3'- GGGCCXAGATCTGGTY-S* (SEQ id no 32) 
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The 5 base 3' overhangs needed for positional SBH are made by replacing 
the complementary (non-biotinylated) strands with constant strands which are one base 
shorter. 

5'.B-GTCGACAGTrGACGCTACCAYNNNN-3' (SEQIDN0 25) 

5 3'- CAGCroTCAACIXKX}ATGGT-5' (SEQIDN0 33) 

5'-B-GCTAGCrcTAGACX:AYNNNN-3' (SEQ ID NO 27) 

3'- CX5ATCGAGATCrGGT-5' (SEQ ID NO 34) 

5'-B-CrCGAGAGTrGAOGCTACCARNNNN-3' (SEQ ID NO 29) 

3'- GAGCICTCAACTGCGATGGT-S' (SEQ ID NO 35) 

10 5'-B-CCCGGGTCTAGACCARNNNN-3'. (SEQ ID NO 31) 

3*- GGGCCCAGATCTGGT-S* (SEQ ID NO 36) 

nris generates the 5 base 3*-overhanging arrays amenable to extension 

with Seqiienase version 2.0 after the ligation step shown in Figures 2A and B. 

Randomly chosen arrays of 5,120 (5X coverage) are needed to ensure that all of the 
15 sequences (>99%) are present, but this array is much larger than optimal. In practice, 

a Ubraiy will need only provide approximately 63% of the sequences and, if necessary, 

can be supplemented to fill in the missing variable clones by direct synthesis. 

Other embodiments and uses of the invention will be apparent to those 

skiUed in the art from consideration of the specification and practice of the invention 
20 disclosed herein. It is intended that the specification and examples be considered 

exemplary only, with the true scope and spirit of the invention being indicated by the 

following claims. 



* 
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SEQUENCE USTING 



(1) GENERAL INFORMATION: 

(i) APPUCANT: CANTOR, Charles 

PRZETAKIEWICZ, Marek 

(u) TITLE OF INVENTION: POSITIONAL SEQUENCING BY 
HYBRIDIZATION 

(iii) NUMBER OF SEQUENCES: 36 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: BAKER & BOTTS. LLP. 

(B) STRE ET: 555 13th Street, N.W., Suite 500 East 
(Q OTY: Washington 

(D) STATE: D.C. 

(E) COUNTRY: VSJl 

(F) ZIP: 20004-1109 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC conqiatible 

(Q OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.25 

(vi) CURRENT APPUCATION DATA: 

(A) APPUCATION NUMBER: US 08/110,691 

(B) FILING DATE: 23-AUG-1993 

(C) CLASSIFICATION: 

(vii) PRIOR APPUCATION DATA: 

(A) APPUCATION NUMBER: US 07/972,012 

(B) FILING DATE: 06-NOV-1992 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Remenick, James 

(B) REGISTRATION NUMBER: 36,902 

(C) REFERENCE/DOCKET NUMBER: 16865-0124 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: (202) 639-7721 
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(B) TELEFAX: (202) 639-7832 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
TCGGTTCCAA GAGCT 
(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
CTGATGCGTC GGATCATC 
(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(u) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
GATGATCOGA CGCATCAGAG CTC 
(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: Dudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(jd) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GATGATCCGA CGCATCAGAG CFT 
(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
GATGATCCGA CGCATCAGAG CTA 
(2) INFORMATION FOR SEQ ID NO:6: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 23 base pairs 

(B) TYRE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



5 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 
GATCATCCGA OGCATCAGAG CCC 
(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 
. (B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 



GATGATCOGA CGCATCAGAG TTC 
(2) INFORMATION FOR SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nudeic add 

(C) SIKANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
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GATGATCCGA CGCATCAGAA CTC 
(2) INFORMATION FOR SEQ ID NO:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 
<D) TOPOIX)GY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
GATGATCCGA CGCATCAGAG ATC 
(2) INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: Dudeic add 

(C5 STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(jd) SEQXJENCE DESCRIPTION: SEQ ID NO: 10: 
GATGATCCGA CGCATCAGAG CTT 
(2) INFORMATION FOR SEQ ID NO:ll: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear • 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 
GATOATOCGA CGCATCAGAT ATC 
(2) INFORMATION FOR SEQ ID NO:12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: oudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genoinic) 



(xi) SEQLJENCE DESCRIPTION: SEQ ID NO:12: 
GATX5AT0CGA CGCATCAGAT ATT 
(2) INFORMATION FOR SEQ ID NO:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
GATX3ATCCG A CGCATCAGAG CTCT 
(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nudeic add 
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(C) STRANDEDNESS: single 

(D) TOPOLCX3Y: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
GATG ATOCGA CGCATCAGAG TTCT 24 
(2) INFORMATION FOR SEQ ID NO:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GTCGACAGTT GACGCTACCA YNNNNRTGGT CTAGAGCTAG C 
41 

(2) INFORMATION FOR SEQ ID NO:16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nudeic add 

(O STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 
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CTCGAGAGTT GACGCTACCA RNNNNYTGGT CTAGACCCXSG G 
41 

(2) INFORMATION FOR SEQ ID NO:17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTEI: 12 base pairs 

(B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



10 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 



GCTAGCTCTA GA 



12 



(2) INFORMATION FOR SEQ ID NO:18: 



15 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 

GCTAGCrCTA GACCAYNNNN RTGGTAGCGT CAACTGTCGA C 
41 



(2) INFORMATION FOR SEQ ID NO:19: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



(») SEQUENCE DESCRimON: SEQ ID NO: 19: 
CCCGGGTCTAGA 

(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: mideic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(u) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

CCCGGGTCTA GAOCARNNNN YTGGTAGCGT CAACTCTOGA G 
41 

(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACIERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
CCANNNNNNTGG 

(2) INFORMATION FOR SEQ ID NO:22: 
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(i) SEQUENCE CHARACIERISnCS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N022: 
CCANNNNNNTGG 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) I^GTH: 15 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
GACGCNNNNN NNNNN 
(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0.24: 
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NNNNNNNNNN GCGTC 15 
(2) INFORMATION FOR SEQ ID NO:25: 

(0 SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base paiis 

(B) TYPE: nudeic add 

(Q STKANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
GTCGACAGTT GACGCTAOCA YNNNN 25 
(2) INFORMATTON FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base paiis 

(B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOUECULE TYPE: DNA (genomic) 



(Jd) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
RTGGTAGCGT CAACTGTOGA C 21 
(2) INFORMATION FOR SEQ ID N027: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
GCTAGCrCTA GAOCAYNNNN 20 
(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nudeic add 

lO STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DN A (geno&iic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
RTGGTCTAGA GCFAGC 16 
(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE OIARACTERISllCS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ED NO:29: 
CTCGAGAGTT GACGCTACCA RNNNN 25 
(2) INFORMATION FOR SEQ ID NO:30: 

(i) SEQUENCE CHARACreRISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nudeic add 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 



(2) INFORMATION FOR SEQ ID NO:31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGIH: 20 base pairs 

(B) TYPE: nudeic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIFIION: SEQ ID NO:31: 
CCCGGGTCTA GACCARNNNN 



15 (2) INFORMATION FOR SEQ ID NO:32: 



5 



YTGGTAGCGT CAACTCTCGA G 



21 



20 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPnON: SEQ ID NO:32: 
YTGGTCTAGA CCCGGG 



16 





wo 94/11530 



PCr/US93/10616 



-72- 



(2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACIERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
TGGTAGCGTC AACTGTCGAC 
10 (2) INFORMATION FOR SEQ ID NO:34: 



(ii) MOLECULE TYPE: DNA (genomic) 



(ad) SEQUENCE DESCRIPnON: SEQ ID NO:34: 
TGGTCTAGAG CTAGC 
(2) INFORMATION FOR SEQ ID NO:35: 

(i) SEQUra^CE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 



15 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nudeic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 



25 



(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 
TGGTAGCGTC AACTCTCGAG 20 
(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic add 

(Q STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO J6: 
TGGTCTAGAC COGGG 15 
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We gaim : 

1. A method for creatiiig an array of probes comprising the steps of: 

a) synthesizing a first set of nucleic adds each comprising a constant 
sequence of length C at a 3'-tenninus and a random sequence of length 
R at a 5'-tenninus; 

b) synthesizing a second set of nucleic adds each conq)rising a sequence 
compUmentaiy to the constant sequence of each of the first nudeic add; 
and 

c) hybridizing the first set with the second set to create the array. 

2. The method of daim 1 wherein the, nudeic adds of the first set are each between 
about 15-30 nudeotides in length and the nudeic adds of the second set are each 
between about 10-25 nudeotides in length, ' 

3. The method of daim 1 wherein C is between about 7-20 nudeotides and R is 
between about 3-5 mideotides. 

4. The method of daim 1 wherein the array comprises about 4* different probes. 

5. The method of daim 1 wherein the array is fixed to a solid support and the solid 
support is selected from the group consisting of plastics, ceramics, metals, resins, gels, 
membranes and chips. 

6. An array of probes created by the method of daim 1. 

7. A method for creating an array of probes fixed to a solid support comprising the 
steps of: 

a) synthesizing a first set of nudeic adds each comprising a constant 
sequence of length C at a 3'-tenninus and a random sequence of length 
R at a 5'-terminus; 

b) fixing the first set to the solid support; 

c) synthesizing a second set of nudeic adds each comprising a sequence 
complimentary to the constant region of the first set; and 

d) hybridizing the nudeic adds of the first set with the second set to create 
the array. 
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8. A method for creating an array of probes comprising the steps of: 

a) synthesizing an array of single-stranded nucleic adds each containing a 
constant sequence at the S'-terminus, another constant sequence at the 5'- 
tenninus, and a random internal sequence of length R flanked by the 
cleavage sites of a restriction en^me; 

b) synthesizing an array of primers each compliementaiy to a portion of the 
constant sequence of the S'-terminus, hybridizing the two arrays together 
to form hybrids; 

c) extending the sequence of each primer by pofymerization using a 
sequence of the nudeic add as a tenqilate; and 

d) deaving the extended hybrids with the restriction enzyme to form an array 
of probes with a double-stranded portion at one terminus, a sin^e- 
stranded portion containing the random sequence at the opposite 
terminus. 

9. The method of daim 8 \(1iereia the nudeic adds are eadi between about 10-50 
nudeotides in length. 

10. The method of daim 8 wherein R is between about 3-5 nudeotides in length. 

1 1. The method of daim 8 >;^erein the restriction eniyme is selected from the group 
consisting of restriction enzymes which produce 5'-overfiangs and restriction cmymcs 
which produce 3*-overhangs. 

12. The method of daim 8 \Aerein the array of probes is fixed to a solid support and 
the solid supi>ort v4iidi is selected from the gjroup consisting of plastics, ceramics, 
metals, resins, gels, membranes and diips. 

13. An array of probes created by the method of daim 8. 

14. A method for creating an array of probes comprising the steps of: 

a) synthesizing an array of single-stranded nudeic adds each containing a 
constant sequence at the 3'-terminus, another constant sequence at the S'- 
terminus, and a random internal sequence of length R flanked by the 
cleavage sites of a restriction enzyme; 
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b) synthesizmg an array of primers with a sequence complimentary to the 
constant sequence at the 3'-terminus; 

c) l^ridiTdng the two arrays together to form hybrids; 

d) em^matically extending the primers using the nucleic adds as templates 
5 to form full-length t^rids; 

e) doning the full-length Iqrbrids into vectors; 

f) amplifying the doned sequences by multq>le polymerase chain reactions; 
and 

g) deaving the anqilified sequences with the restriction enzyme to form the 
10 array of probes with a doiible-stranded portion at one terminus and a 

single-stranded portion containing the random sequence at the opposite 
terminus. 

15. The method of daim 14 i^idierein the array of probes have 5*- or 3*-overhangs, 

16. Tlie method of daim 14 wherein the array of probes is fixed to a solid support 
15 and the solid support is selected from the group consisting of plastics, ceramics, metals, 

resins, pofymers, films, gels, membranes and diips. 

17. An array of probes created by the method of daim 14. 

18. A method for detecting a nudeic add in a biological sample comprising the steps 
of: 

20 - a) creating an array of probes fixed to a solid support according to the 

method of daim 7; 

b) labeling the nudeic add of the biological sample with a detectable label; 

c) hybridizing the labeled nudeic add to the array; and 

d) detecting the sequence of the nudeic add fix)m a binding pattern of the 
25 label on the array. 

19. A method for identifying a target nudeic add in a biological sample comprising 
the steps of: 

a) creating an array of pn)bes fixed to a solid support according to the 
method of daim 7; 
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b) labeling the target of the biological sample with a detectable label; 

c) hybridizing the labeled target to the array; and 

d) identifying the target from a binding pattern of the label on the array. 

20. The method of daim 19 wherein the detectable label is selected from the group 
consisting of radioisotopes, stable isotopes, em^mes, fluorescent and luminescent 
chemicals, chromatic chemicals, metals, electric charges, and spatial chemicals. 

21. Hie method of daim 19 wherein the nudeic add identified is selected from the 
group consisting of nudeic adds derived from viruses, bacteria, parasites, fungi and 
yeast 

22. The method of daim 19 v^erein tl^e binding pattern is a nudeic add fingerprint 

23. A diagnostic aid for detecting a target nudeic add in a biological sample 
conqirising the array of daim 19, a solid siqyport on vMch the array is fixed, a 
detectable label, and the biological sanople. 

24. The method of daim 19 wherein the biological sanople is selected from the groiq> 
consisting of sanqiles of animal tissue, environmental substances, and manufacturing 
products and by-products. 

25. The method of claim 24 wherein the animal tissue is obtained from a human. 

26. Hie method of daim 19 further conqirising the step of purifying the target 
imdeic adds identified. 

27. A method for replicating an array of single-stranded probes on a solid support 
conqirising the steps of: 

a) synthesizing an array of rmdeic adds each comprising a constant sequence 
of length C at a 3'-terminus and a random sequence of length R at a S'- 
terminus; 

b) fixing the array to a first solid support; 

c) synthesizing a set of nudeic adds each comprising a sequence 
complimentaiy to the constant sequence; 

d) hybridizing the nudeic adds of the set with the array; 
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enzymaticafly extending the nucleic adds of the set using the random 
sequences of the array as templates; 
denaturing the set of extended nucleic adds; and 
fixing the denatured nudeic adds of the set to a second solid support to 
create the replicated array of single-stranded probes. 

28. The method of daim 27 wherein the nudeic adds of the set are conjugated with 
biotin and the second solid siq>port conq>rises streptavidin. 

29. Hie method of daim 27 wherein the nudeic adds of the array are between about 
15-30 nudeotides in length and the nudeic adds of the set are between about 10-25 
nudeotides in length. 

30. The method of daim 27 wherein C is between about 7-20 nudeotides and R is 
between about 3-5 nudeotides. 

3L The method of daim 27 wherein the solid support is selected from the group 
consisting of plastics, ceramics, metals, resins, gels, membranes and chips. 

32. The method of daim 27 wherem the nudeic adds of the set are enzymaticalfy 
extended with a DNA pofymerase and one or more deojQmudeotide triphosphates. 

33. Hie method of daim 27 wherein denaturing is performed with heat, alkali, 
organic solvents, binding proteins, emymes, salts or combinations thereof. 

34. A replicated array of single-stranded probes made by the method of daim 27. 

35. Hie method of daim 27 further comprising the step of Iqrbridizing the replicated 
array with a second set of nudeic adds complimentary to the constant sequence of the 
replicated array to create a double-stranded replicated array. 

36. A replicated array of double-stranded probes made by the method of daim 35. 

37. A method for creating a probe conq>rising the steps of: 

a) synthesizing a plurality of first nudeic adds and a plurality of second 
nudeic adds comprising a random terminal sequence and a sequence 
conq)limentary to a sequence of the first nudeic adds; 

b) hybridizing the first nudeic adds with the second to form partial duplexes; 

c) hybridizmg a target nudeic add to the partial duplexes; 
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d) ligating the hybridized target to the first nucleic acid of the partial 
duplexes; 

e) isolating the second nucleic add £rom the ligated duplexes; and 

f) synthesizing a plurality of third nucleic adds each complimentary to the 
S constant sequence of the second nudeic add and hjrbridizing the third 

nudeic adds with the isolated second nudeic adds to create a probe. 
38. Hie method of daim 37 wherein the first nudeic adds are eadi between about 
15-2S nudeotides in length and the second nudeic adds are each between about 20-30 
nudeotides in length. 

10 39. The method of daim 37 wherein the target is hybridized to the partial duplexes 

under a single set of hybridization conditions. 

40. Ibe method of daim 39 wherein the hybridization conditions comprise a 
ten^>erature of between about 22r37^X>Q a salt concentration of between about O.QS-0.2 
M, and a time period of between about 1-14 hours. 
15 41. The method of daim 37 wherein a double-stranded portion of the partial duplex 

contains an enzyme recognition site. 

42. A probe created by the method of daim 37. 

43. The probe of daim 42 which is fixed to a solid support and the solid support is 
selected firom the group consisting of plastics, ceramics, metals, resins, gels, membranes 

20 - and dnps. 

44. A diagnostic aid for the detection of a target nudeic add in a biological sample 
conqirising the probe of daim 42, a solid support on whidi the probe is fixed, a 
detectable label, and the biological sanq)le. 

45. A method for creating a probe conqyrising the steps of: 

25 a) synthesizing a plurality of first nudeic adds and a plurality of second 

nudeic adds each comprising a random terminal sequence and a 
sequence complimentary to the sequence of the first nudeic 'adds; 
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hybridizmg the first nudeic adds with the second nudeic adds to form 
partial diq>lexes having a double-stranded portion and a single-stranded 
portion with the random sequence within the single-stranded portion; 
hybridizing a target nucleic add to the partial duplexes; 
ligating the l^ridized target to the first nudeic add of the partial duplex; 
l^ridizing the ligated target with a set of oligomideotides conq>rismg 
random sequences; 

ligating the hybridized oligonudeotide to the second nudeic add; 
isolating the oligonudeotide ligated second nudeic add; and 
synthesizing another plurality of first nudeic adds and li^bridizing the first 
nudeic adds with the isolated second nudeic add to create the probe. 

46. The method of daim 45 wherein the first nudeic adds are each between about 
15*25 nudeotides in length, the second nudeic adds are each between about 20- 
30 nudeotides in length, and the oligomideotides are each between about 4-20 
nudeotides in length. 

47. The method of daim 45 wherein the target is hybridized to the partial duplexes 
under a single set of l^ridization conditions. 

48. The method of daim 45 wherein the hybridization conditions conqirise a 
temperature of between about 22-37'X)C; a salt concentration of between about O.QS-0.2 
M, and a time period of between about 1-14 hours. 

49. The method of daim 45 wherein the partial duplexes contain an enzyme 
recognition site. 

50. A imdeic add probe created by the method of daim 45. 

5 1. The nucleic add probe of daim 50 whidi is fixed to a solid support selected from 
the group consisting of plastics, ceramics, metals, resin, gels, membranes and chips. 

52. A diagnostic aid for the detection of a target nudeic add in a biological sample 
comprising the probe of daim 45, a solid support on which the probe is fixed, a 
detectable label, and the biological sample. 
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53. A method for creating a probe comprising the steps of: 

a) synthesizing a plmality of first nudeic adds and a plurality of second 
nudeic adds comprising a random terminal sequence and a sequence 
complimentary to a sequence of the first nucleic add; 
5 b) hybridizing the first nucleic adds to the second nudeic adds to form 

partial duplexes having a double-stranded portion and a single-stranded 
portion with the random nudeotide sequence within the sin^e-stranded 
portion; 

c) hybridizing a target nudeic add to the partial diq>lexes; 
10 d) ligating the hybridized target to the first nudeic add of the partial duplex; 

e) enzymatically extending the second nudeic add using the target as a 
ten^late; 

f) isolating the extended second nudeic add; and 

g) synthesizmg another first nudeic add and b^ridizmg the first nudeic add 
15 with the isolated and extended second nudeic add to create a probe. 

54. The method of daim 53 \rfierein the first nudeic adds are eadi between about 
15-25 nudeotides in length and the second nudeic adds are each between about 20-30. 
imdeotides in length. 

55. Tbe method of daim 53 wherein the target is hybridized to the partial duplexes 
20 1 under a single set of hybridization conditions. 

56. The method of daim 55 wherein the hybridization conditions comprise a 
tenq>erature of between about 22-37'X>C, a salt concentration of between about 0.05-0.2 
M, and a time period of between about 1-14 hours. 

57. The method of daim 53 wherein the double-stranded portion contains an enzyme 
25 recognition site. 

58. The method of daim 53 wherein the target nucleic add is obtained from a 
biological sample selected from the group consisting of samples of animal tissue, 
environmental substances, and manufacturing products and by-products. 

59. A nucleic add probe created by the method of daim 53. 
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60. The nucleic add probe of daim 59 which is fixed to a solid support and the solid 
support is selected from the group consisting of plastics, ceramics, metals, resins, gels, 
membranes and chips. 

61. A diagnostic aid for the detection of a target nucleic add in a biological sample 
comprising the nudeic add probe of daim 59, a solid support on which the probe is 
fixed, a detectable label and the biological sanq>le. 

62. An airay of 4* different nudeic add probes wherein each probe con^irises a 
double-stranded portion of length D, a terminal single-stranded portion of Iragth S, and 
a random nudeotide sequence within the single-stranded portion of length R 

63. The array of daim 62 wherein D.is between about 3-20 nudeotides and S is 
between about 3-20 nudeotides. 

64. The array of daim 62 which is fixed to a solid support wherein the solid support 
is selected from the group consisting of plastics, ceramics, metals, resins, gels, 
membranes and two-dimensional and three-dimensional matrices. 
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